ITR: The ARCHway Project

Kevin Kiernan, PI
Department of English
University of Kentucky

Alex Dekhtyar, co-PI
Department of Computer Science
University of Kentucky

Jerzy W. Jaromczyk, co-PI
Department of Computer Science
University of Kentucky

Dorothy C. Porter
Research in Computing for Humanities
University of Kentucky

Contact Information

Kevin Kiernan
Department of English
1201 Patterson Office Tower
University of Kentucky
Lexington, KY, 40506
Phone:(912) 634 2803
Fax:(859) 323 1072
Email: kiernan@uky.edu
URL: http://www.uky.edu/AS/English/faculty/ksk.html

Alex Dekhtyar
Department of Computer Science
773 Anderson Hall
University of Kentucky
Lexington, KY, 40506
Phone: (859) 257 1839
Fax: (859) 323 1971
Email: dekhtyar@cs.uky.edu
URL: http://www.cs.uky.edu/~dekhtyar

Jerzy W. Jaromczyk
Department of Computer Science
773 Anderson Hall
University of Kentucky
Lexington, KY, 40506
Phone: (859) 257 1186
Fax: (859) 323 1971
Email: jurek@cs.uky.edu
URL: http://www.cs.uky.edu/~jurek

Dorothy C. Porter
Research in Computing for Humanities
Department of English
3-51/3-52 Young Library
University of Kentucky
Lexington, KY, 40506
Phone: (859) 257 9549
Fax: (859) 323 1072
Email: dporter@uky.edu
URL: http://rch.uky.edu/

List of Supported Students and Staff

Other collaborators

Project Award Information

Project Summary

The ARCHway Project: Architecture for Research in Computing for the Humanities through Collaborative Research, Teaching, and Learning has brought together Humanities researchers and Computer Scientists to implement a modular, extensible Edition Production Technology (EPT), based on an open and standards-based architecture that incorporates cutting edge XML technologies and XML- Java applications. The EPT provides new techniques and tools for the use of humanities scholars to produce image-based electronic editions of culturally significant artifacts. To build the EPT, we established a technological and pedogogical infrastructure for collaborative research and teaching between computer science and the humanities. The project has involved scholars and students from both English and Computer Science, working together to identify and solve problems of mutual interest in the context of creating both the EPT and image-based electronic scholarly editions using the tools, applications, and technologies incorporated into the EPT. In the process, the project has established a model for collaborative research and teaching by developing innovative methodologies to provide students in both computer science and the humanities hands-on experience contributing to a major research project.

Publications and Products


[1] Kevin Kiernan, "The nathwylc Scribe and the Beowulf Palimpsest," for a festschrift in honor of Helen Damico, ed. Catherine Karkov and Nancy van Deusen, 2004
[2] Kevin Kiernan, Alex Dekhtyar, Jerzy W. Jaromczyk, Dorothy C. Porter, Ionut E. Iacob, "Edition Production Technology (EPT) and the ARCHway Project," forthcoming in DigiCULT Newsletter
[3] Kevin Kiernan, Jerzy W. Jaromczyk, Alex Dekhtyar, Dorothy Carr Porter, Kennteh Hawley, Sandeep Bodapati, Ionut Emil Iacob, The ARCHway Project: Architecture for Research in Computing for Humanities through Research, Teaching and Learning, (2004), accepted to Literary and Linguistic Computing.
[4] Ionut E. Iacob, Alex Dekhtyar, Kazuyo Kaneko, Parsing Concurrent XML. Accepted, 6th ACM International Workshop on Web Information and Data Management (WIDM 2004), November 12-13, 2004, Washington, DC.
[5] Jerzy W. Jaromczyk, Miroslaw Kowaluk, Neil Moore, A web interface to image-based concurrent markup using image maps. Accepted, 6th ACM International Workshop on Web Information and Data Management (WIDM 2004), November 12-13, 2004, Washington, DC.
[6] Alex Dekhtyar and Ionut E. Iacob. A Framework For Management of Concurrent XML Markup. Accepted, Data and Knowledge Engineering, 2004
[7] Ionut E. Iacob, Alex Dekhtyar and Michael I. Dekhtyar. Checking Potential Validity of XML Documents. Proceedings 7th International Workshop on the Web and Databases (WebDB'2004), June 2004.
[8] Wenzhong Zhao, Alex Dekhtyar, and Judy Goldsmith. Databases for Interval Probabilities, International Journal of Intelligent Systems (IJIS), volume 20, part 2.
[9] Wenzhong Zhao, Alex Dekhtyar, and Judy Goldsmith. A Framework for Management of Semistructured Probabilistic Data. Accepted, Journal of Intelligent Information Systems (JIIS)
[10] Alex Dekhtyar, Ionut E. Iacob, Jerzy W. Jaromczyk, Neil Moore, Dorothy C. Porter. Multihierarchical XML Markup of Image-based Electronic Editions: Issues, Data Structures, and Algorithms. Submitted to special journal issue, May 2004.
[11] Alex Dekhtyar, Ionut Emil Iacob, Jerzy W. Jaromczyk, Kevin Kiernan, Neil Moore, Dorothy Carr Porter, Database Support for Image-based Electronic Editions. Proceedings, 10th International Workshop on Multimedia Information Systems (MIS 2004), August 25-27, 2004, College Park, MD.
[12] Jerzy W. Jaromczyk and Neil Moore. Geometric data structures for multihierarchical XML tagging of manuscripts. Proceedings, 20th European Workshop on Computational Geometry, Seville, Spain, March 2004.
[13] Alex Dekhtyar and Ionut E. Iacob, A Framework for Management of Concurrent XML Markup, Proc. 1st. International Workshop on XML Schema and Data Management (XSDM'03), LNCS, Vol. 2814, pp. 311-322, 2003.
[14] Kevin Kiernan and Kenneth C. Hawley, An Image-Based Electronic Edition of Alfred the Great's Old English Version of Boethius's Consolation of Philosophy, Proc. Joint International Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing, pp 13-14, 2003
[15] Jerzy W. Jaromczyk and Sandeep Bodapati, An Architecture Promoting Collaborative Research, Teaching and Learning, Proc. Joint International Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing, pp. 10-11, 2003.
[16] Alex Dekhtyar and Ionut E. Iacob, Management of Data for Building Electronic Editions of Historic Manuscripts, Proc. Joint International Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing, pp. 11-13, 2003.
[17] Jerzy W. Jaromczyk, Neil Moore, and M. Kowaluk, On vizualization of complex image-based markup, accepted for presentation at the 2nd International Conference on Computer Vision and Graphics (ICCVG 2004), September 22-24, 2004, Warsaw, Poland.

Book(s) of other one-time publication(s):

[18] Kevin Kiernan, Digital Facsimiles in Editing: Some Guidelines for Editors of Image-based Scholarly Editions, bibl. Modern Language Association and the TEI Consortium, funded by the Mellon Foundation, (2004). Book Accepted of Collection: John Unsworth, Katherine O'Brien O'Keeffe, and Lou Burnard, Electronic Textual Editing 

Technical Reports

[19] Jerzy W. Jaromczyk and Neil Moore. Geometric data structures for multihierarchical XML tagging of manuscripts. University of Kentucky technical report #404-04, May 2004.
[20] Ionut E. Iacob, Alex Dekhtyar and Michael I. Dekhtyar. Checking Potential Validity of XML Documents. University of Kentucky Technical Report #403-04, May 2004.
[21] Ionut E. Iacob, Alex Dekhtyar, Wenzhong Zhao, XPath Extension for Querying Concurrent XML Markup. University of Kentucky tech. report. TR 394-03, February 2004.

Project Impact

The project has advanced the state-of-the-art in the areas of management of XML data and literary computing. New methods, technologies and algorithms have been proposed and implemented to (i) assist human editors in creation of Image-Based Electronic Editions and (ii) efficiently manage image-based, concurrent XML markup produced by the editors. The software suite developed within the ARCHway project provides a convenient and powerful framework for the process of preparation of Image-based Electronic Editions of historic documents, as well as tools for deployment of prepared editions. The data management problems addressed within the project extend beyond the current application: management of concurrent XML lies in the heart of a large number of document encoding and document processing projects.

Goals, Objectives, and Targeted Activities

The following high-level objectives have been set in the project:

Objective 1. Development of a universal software platform for creation and maintenance of Image-Based Electronic Editions.
Objective 2. Development of methods, algorithms and data structures for management of concurrent document-centric XML.
Objective 3. Development of methods, algorithms and data structures for management of image-based XML encodings.
Objective 4. Establishment of a teaching and learning infrastructure fostering interdisciplinary collaboration between graduate and undergraduate students in Computer Science and humanities.

Current and Future Activities

This section provides a brief overview of our current acitivities w.r.t. the four objectives stated in the previous section.

Objective 1. To address the demands of Image-Based Electronic Edition production, we have selected an open-source Java Eclipse platform as the development and deployment environment and chose a three-tier plug-in architecture for the EPT. The top tier plugins (developed within the ARCHway and our sister Electronic Boethius projects) implement user-friendly editorial and presentation tools. The middle layer provides the "glue" and ensures flawless flow of information between the top-tier plugins. It also encapsulates data access, allowing for the independent development of editorial/presentation tools and data management back-end of the system -- the bottom layer of the EPT.

Objective 2. Concurrent XML markup occurs when a single text document needs to be encoded with XML elements with overlapping scope. We have formalized the notion of concurrent XML markup [13,6], described algorithms for converting legacy documents with concurrent markup into our representation [13,6], studied storage and manipulation of concurrent markup in main memory in segment trees [10,12,11,5,18] and GODDAG graphs [4], developed a DOM-like parser/API for concurrent XML [4] and studied the semantics and possible extensions of XPath over GODDAG structures [20]. In addition we are working on index structures to represent concurrent markup in secondary storage. Finally, we have studied the problem of potential validity of document-centric XML documents, and have designed a linear-time checking algorithm [7,19].

Objective 3. Image-Based Electronic Editions pose a unique challenge w.r.t. management of edition data: all information gathered throughout the editorial process is image-based and must be tightly connected to the source images at all times. We have developed data structures for storing image-to-markup and image-to-text mappings [10,11] both in main memory (segment trees) and in persistant storage (folio R-trees).

Objective 4. Collaboratory for the Research in Computing for Humanities, in addition to providing lab space and other facilities, serves as a collaboratory for Computer Science and humanities students. A number of student projects within the ARCHway framework have been conducted by joint teams of students, graduate and undergraduate, from both disciplines. Computer Science students learn the art and craft of knowledge engineering while humanities students gain valuable technical experience.

The following work will be conducted in the nearest future:

  1. Finish assembly of the EPT.
  2. Develop additional EPT tools.
  3. Implement query processor for XPath over GODDAG graphs.
  4. Finish development of the special-purpose DBMS for storing image-based concurrent XML encoding of the editions.
  5. Integrate all parts of the ARCHway project software.
  6. Integrate the ARCHway project software with the software developed in the sister Electronic Boethius project.

Area Background

Artistic, historical, and literary artifacts are the substance of the humanities disciplines. The ravages of time, fire, water, environmental conditions, and poorly applied technology threaten continued access to physical artifacts, while the technological unknowns about long-term access to and preservation of digital resources present challenges that must be addressed quickly to prevent permanent loss of artifacts no longer accessible as physical objects. Digital technology provides extraordinary means of accessing, examining, and studying the documentary heritage of human civilization. Through our work in the ARCHway project we are providing new access to the three Old English manuscripts that comprise our testbed: The British Library manuscripts of Alfred's Boethius, AElfric's Lives, and Beowulf. These three manuscripts were all damaged by fire (to a greater or lesser extent) in the eighteenth century, and to make these resources truly available for study and research, we need technical tools to help us interpret the damaged texts, assemble the text and images and provide links between them, and disseminate the resulting image-based electronic scholarly editions. 

Area References

  1. Durusau P., O'Donnel M.B. (2002) Concurrent Markup for XML Documents. In Proceedings of XML Europe, May 2002.
  2. Tian F., DeWitt D.J., Chen J., Zhang C. The Design and Performance Evaluation of Alternative XML Storage Strategies. SIGMOD Record, 31/1, March 2002.
  3. C.M. Sperberg-McQueen and C Huitfeldt. GODDAG: A Data Structure for Overlapping Hierarchies, in Proc. ACH-ALLC Conference, June 1999.

Potential Related Projects

The Electronic Boethius Project, the Electronic Beowulf Project , the Digital Atheneum, Digital Image Archive of Medieval Music, The Digital Medievalist Project.

Project Websites

The official ARCHway Project website http://beowulf.engl.uky.edu/~kiernan/ARCHway/entrance.htm
Collaboratory for Research in Computing for Humanities http://www.rch.uky.edu/

Illustrations


Architecture of the ARCHway Project

Online Data

http://rch01.rch.uky.edu/~ept/

The Edition Production Technology Team Homepage. Includes resources for EPT development.