Kevin Kiernan, PI |
Alex Dekhtyar, co-PI |
Jerzy W. Jaromczyk, co-PI |
Dorothy C. Porter |
Kevin Kiernan |
Alex Dekhtyar |
Jerzy W. Jaromczyk |
Dorothy C. Porter |
The ARCHway Project: Architecture for Research in Computing for the Humanities through Collaborative Research, Teaching, and Learning has brought together Humanities researchers and Computer Scientists to implement a modular, extensible Edition Production Technology (EPT), based on an open and standards-based architecture that incorporates cutting edge XML technologies and XML- Java applications. The EPT provides new techniques and tools for the use of humanities scholars to produce image-based electronic editions of culturally significant artifacts. To build the EPT, we established a technological and pedogogical infrastructure for collaborative research and teaching between computer science and the humanities. The project has involved scholars and students from both English and Computer Science, working together to identify and solve problems of mutual interest in the context of creating both the EPT and image-based electronic scholarly editions using the tools, applications, and technologies incorporated into the EPT. In the process, the project has established a model for collaborative research and teaching by developing innovative methodologies to provide students in both computer science and the humanities hands-on experience contributing to a major research project.
[1] Kevin Kiernan, "The nathwylc Scribe and the Beowulf Palimpsest," for a festschrift in honor of Helen Damico, ed. Catherine Karkov and Nancy van Deusen, 2004
[2] Kevin Kiernan, Alex Dekhtyar, Jerzy W. Jaromczyk, Dorothy C. Porter, Ionut E. Iacob,
"Edition Production Technology (EPT) and the ARCHway Project," forthcoming in DigiCULT Newsletter
[3] Kevin Kiernan, Jerzy W. Jaromczyk, Alex Dekhtyar, Dorothy Carr Porter, Kennteh Hawley, Sandeep Bodapati, Ionut Emil Iacob, The ARCHway Project: Architecture for Research in Computing for Humanities through Research, Teaching and Learning, (2004), accepted to Literary and Linguistic Computing.
[4] Ionut E. Iacob, Alex Dekhtyar, Kazuyo Kaneko, Parsing Concurrent XML. Accepted,
6th ACM International Workshop on Web Information and Data Management (WIDM 2004),
November 12-13, 2004, Washington, DC.
[5] Jerzy W. Jaromczyk, Miroslaw Kowaluk, Neil Moore, A web interface to image-based concurrent markup using image maps. Accepted, 6th ACM International Workshop on Web Information and Data Management (WIDM 2004),
November 12-13, 2004, Washington, DC.
[6] Alex Dekhtyar and Ionut E. Iacob. A Framework For Management of Concurrent XML Markup. Accepted,
Data and Knowledge Engineering, 2004
[7] Ionut E. Iacob, Alex Dekhtyar and Michael I. Dekhtyar. Checking Potential Validity of XML Documents. Proceedings 7th International Workshop on the Web and Databases (WebDB'2004), June 2004.
[8] Wenzhong Zhao, Alex Dekhtyar, and Judy Goldsmith. Databases for Interval Probabilities,
International Journal of Intelligent Systems (IJIS), volume 20, part 2.
[9] Wenzhong Zhao, Alex Dekhtyar, and Judy Goldsmith. A Framework for Management of Semistructured Probabilistic Data. Accepted, Journal of Intelligent Information Systems (JIIS)
[10] Alex Dekhtyar, Ionut E. Iacob, Jerzy W. Jaromczyk, Neil Moore, Dorothy C. Porter. Multihierarchical XML Markup of Image-based Electronic Editions: Issues, Data Structures, and Algorithms. Submitted to special journal issue, May 2004.
[11] Alex Dekhtyar, Ionut Emil Iacob, Jerzy W. Jaromczyk, Kevin Kiernan, Neil Moore, Dorothy Carr Porter, Database Support for Image-based Electronic Editions. Proceedings,
10th International Workshop on Multimedia Information Systems (MIS 2004),
August 25-27, 2004, College Park, MD.
[12] Jerzy W. Jaromczyk and Neil Moore. Geometric data structures for multihierarchical XML tagging of manuscripts. Proceedings, 20th European Workshop on Computational Geometry, Seville, Spain, March 2004.
[13] Alex Dekhtyar and Ionut E. Iacob, A Framework for Management of Concurrent XML Markup, Proc. 1st. International Workshop on XML Schema and Data Management (XSDM'03), LNCS, Vol. 2814, pp. 311-322,
2003.
[14] Kevin Kiernan and Kenneth C. Hawley, An Image-Based Electronic Edition of Alfred the Great's Old English
Version of Boethius's Consolation of Philosophy, Proc. Joint International Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing,
pp 13-14, 2003
[15] Jerzy W. Jaromczyk and Sandeep Bodapati, An Architecture Promoting Collaborative Research, Teaching and Learning, Proc. Joint International Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing, pp. 10-11, 2003.
[16] Alex Dekhtyar and Ionut E. Iacob, Management of Data for Building Electronic Editions of Historic Manuscripts, Proc. Joint International Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing, pp. 11-13, 2003.
[17] Jerzy W. Jaromczyk, Neil Moore, and M. Kowaluk, On
vizualization of complex image-based markup, accepted for presentation at the
2nd International Conference on Computer Vision and Graphics (ICCVG 2004),
September 22-24, 2004, Warsaw, Poland.
Book(s) of other one-time publication(s):
[18] Kevin Kiernan, Digital Facsimiles in Editing: Some Guidelines for Editors of Image-based Scholarly Editions, bibl. Modern Language Association and the TEI Consortium, funded by the Mellon Foundation, (2004). Book Accepted of Collection: John Unsworth, Katherine O'Brien O'Keeffe, and Lou Burnard, Electronic Textual Editing
[19] Jerzy W. Jaromczyk and Neil Moore. Geometric data structures for multihierarchical XML tagging of manuscripts. University of Kentucky technical report #404-04, May 2004.
[20] Ionut E. Iacob, Alex Dekhtyar and Michael I. Dekhtyar. Checking Potential Validity of XML Documents. University of Kentucky Technical Report #403-04, May 2004.
[21] Ionut E. Iacob, Alex Dekhtyar, Wenzhong Zhao, XPath Extension for Querying Concurrent XML Markup.
University of Kentucky tech. report. TR 394-03, February 2004.
The project has advanced the state-of-the-art in the areas of management of XML data and literary computing. New methods, technologies and algorithms have been proposed and implemented to (i) assist human editors in creation of Image-Based Electronic Editions and (ii) efficiently manage image-based, concurrent XML markup produced by the editors. The software suite developed within the ARCHway project provides a convenient and powerful framework for the process of preparation of Image-based Electronic Editions of historic documents, as well as tools for deployment of prepared editions. The data management problems addressed within the project extend beyond the current application: management of concurrent XML lies in the heart of a large number of document encoding and document processing projects.
The following high-level objectives have been set in the project:
This section provides a brief overview of our current acitivities w.r.t. the four objectives stated in the previous section.
Objective 1. To address the demands of Image-Based Electronic Edition production, we have selected an open-source Java Eclipse platform as the development and deployment environment and chose a three-tier plug-in architecture for the EPT. The top tier plugins (developed within the ARCHway and our sister Electronic Boethius projects) implement user-friendly editorial and presentation tools. The middle layer provides the "glue" and ensures flawless flow of information between the top-tier plugins. It also encapsulates data access, allowing for the independent development of editorial/presentation tools and data management back-end of the system -- the bottom layer of the EPT.
Objective 2. Concurrent XML markup occurs when a single text document needs to be encoded with XML elements with overlapping scope. We have formalized the notion of concurrent XML markup [13,6], described algorithms for converting legacy documents with concurrent markup into our representation [13,6], studied storage and manipulation of concurrent markup in main memory in segment trees [10,12,11,5,18] and GODDAG graphs [4], developed a DOM-like parser/API for concurrent XML [4] and studied the semantics and possible extensions of XPath over GODDAG structures [20]. In addition we are working on index structures to represent concurrent markup in secondary storage. Finally, we have studied the problem of potential validity of document-centric XML documents, and have designed a linear-time checking algorithm [7,19].
Objective 3. Image-Based Electronic Editions pose a unique challenge w.r.t. management of edition data: all information gathered throughout the editorial process is image-based and must be tightly connected to the source images at all times. We have developed data structures for storing image-to-markup and image-to-text mappings [10,11] both in main memory (segment trees) and in persistant storage (folio R-trees).
Objective 4. Collaboratory for the Research in Computing for Humanities, in addition to providing lab space and other facilities, serves as a collaboratory for Computer Science and humanities students. A number of student projects within the ARCHway framework have been conducted by joint teams of students, graduate and undergraduate, from both disciplines. Computer Science students learn the art and craft of knowledge engineering while humanities students gain valuable technical experience.
The following work will be conducted in the nearest future:
Artistic, historical, and literary artifacts are the substance of the humanities disciplines. The ravages of time, fire, water, environmental conditions, and poorly applied technology threaten continued access to physical artifacts, while the technological unknowns about long-term access to and preservation of digital resources present challenges that must be addressed quickly to prevent permanent loss of artifacts no longer accessible as physical objects. Digital technology provides extraordinary means of accessing, examining, and studying the documentary heritage of human civilization. Through our work in the ARCHway project we are providing new access to the three Old English manuscripts that comprise our testbed: The British Library manuscripts of Alfred's Boethius, AElfric's Lives, and Beowulf. These three manuscripts were all damaged by fire (to a greater or lesser extent) in the eighteenth century, and to make these resources truly available for study and research, we need technical tools to help us interpret the damaged texts, assemble the text and images and provide links between them, and disseminate the resulting image-based electronic scholarly editions.
The Electronic Boethius Project, the Electronic Beowulf Project , the Digital Atheneum, Digital Image Archive of Medieval Music, The Digital Medievalist Project.
The official ARCHway Project website | http://beowulf.engl.uky.edu/~kiernan/ARCHway/entrance.htm |
Collaboratory for Research in Computing for Humanities | http://www.rch.uky.edu/ |
http://rch01.rch.uky.edu/~ept/
The Edition Production Technology Team Homepage. Includes resources for EPT development.