Edition Production Technology (EPT) and the ARCHway Project[1]

Kevin Kiernan, Principal Investigator, Department of English, University of Kentucky

Alex Dekhtyar, co-PI, Department of Computer Science, University of Kentucky

Jerzy W. Jaromzcyk, co-PI, Department of Computer Science, University of Kentucky

Dorothy Carr Porter, Research in Computing for Humanities, University of Kentucky

Ionut Emil Iacob, Department of Computer Science, University of Kentucky

 

 

The ARCHway Project (Architecture for Research in Computing for Humanities through collaborative research, teaching, and learning) is an unusual collaboration of scholars and students in computer science and the humanities  who seek to identify and solve problems of mutual importance in building image-based electronic editions of significant cultural materials. To accomplish these goals, ARCHway is developing a workbench of  integrated tools, called Edition Production Technology, or EPT.  Using its underlying programming platform, Eclipse, we are creating new editing tools and integrating those tools already developed under other projects, and then using these tools to prepare image-based electronic editions.

I. Background

The concept of electronic editing tools began under the Electronic Beowulf project, when the need for these tools became evident, and continued under the Digital Atheneum project with the development of a Glossary Tool.[2]  The production of focused, well-designed, XML-based editing tools began in earnest in Fall 2002 under the Electronic Boethius project (see figure 1).[3]

Fig. 1- Electronic Boethius Electronic Editions Environment (E3)

In this phase the Electronic Boethius team, under the programming leadership of Ionut Emil Iacob, created a Java prototype EPT called the Electronic Editions Environment, or E3, which integrated the Glossary Tool with text and an XML tagging capability to facilitate searching of text and glossary.  During this period the team also developed several standalone tools in Java, designed to incorporate images, which E3 also integrated.[4]  Thus, while we had a prototype of EPT in E3, the continual requirement to integrate emerging new tools posed a significant programming problem.

II. The Integrated Platform

When the ARCHway project began in Spring 2003, Jerzy W. Jaromczyk, one of two co-PIs in Computer Science, advised us to reprogram these standalone tools and develop all new tools using the Eclipse programming environment, an open-source platform originally developed by IBM, now broadly used and actively enhanced by the open-source community.[5]  Although this move required reprogramming of our editing tools, Eclipse suited the needs of ARCHway and the Electronic Boethius in two ways. First, it provided an effective software architecture for the production and deployment of the EPT, which is now organized as a set of distinct plugin tools that work together through Eclipse. Each tool is responsible for a specific editing or administrative task, and the tools can also work with one another through the platform, effectively borrowing functionality and allowing for the creation of new tools without having to reprogram established functions.[6]  Second, Eclipse is an ideal environment for the teaching and learning aims of ARCHway. The EPTÕs plugin design makes it possible to assign individual tools to research assistants, to computer science students as class projects, and to teams of computer science and humanities students collaborating on Masters and Informatics projects. 

The programming teams use the appropriate Application Programming Interfaces (APIs) available in the Eclipse Plugin Development Platform to build all editing tools. To insure that these tools are in fact useful for editors, the programmers always work under the guidance of  the editor/PI and in consultation with the humanities research assistants.  The programmers can integrate tools-in-progress into the EPT, because Eclipse comes with a Concurrent Versioning System (CVS), which permits multiple users to modify the same files without overwriting one anotherÕs work.[7]  CVS is especially valuable for ARCHway, as everyone works collaboratively on both programming and editing.  CVS also contributes in important ways to the teaching and learning goals of ARCHway, because for testing and grading purposes the Principal Investigators require continual, reliable access to the most recent tools and editing projects.

Another advantage of Eclipse is that it helps achieve uniformity, adaptability, and extensibility in areas that can raise high hurdles for complex humanities computing projects. The Graphical User Interfaces (GUIs) developed under Eclipse are attractive and uniform, and humanities editors can easily configure them without new programming support by using XML configuration files. Eclipse works across operating systems, allowing Windows and Linux (and Macintosh, to a some extent) to support the same tools with the same native appearance. The programming platform also provides automatic updating for the emerging tools, a critical capability that encourages the refinement and expansion of features as well as the correction of programming errors or bugs in the EPT.  In the long run this capability will enable automatic online upgrades to the completed electronic editions. With its strong support for XML and its high quality imaging capabilities, Eclipse is thus in many ways ideal for the development of image-based electronic editions.

III. Edition Production Technology

The EPT now consists of three software layers: one for editing and administrative tools, one for middleware, and one for data management (figure 2).

File written by Adobe Photoshop¨ 5.0

Fig. 2 - ARCHway Model for Edition Production Technology (EPT)

The editing and administrative tools provide the functionality for managing projects and editing primary resource images and text, just as the presentation tools will eventually provide the functionality for using the completed image-based editions in interactive displays and searching facilities. The middleware layer, under the guidance of Jaromczyk and the other Computer Science co-PI, Alex Dekhtyar, provides the utility plugins that allow the upper-level tools to communicate with each other and share image-enriched information of all kinds  from the data management layer, DekhtyarÕs domain. The data management layer contains the routines devoted to storage, maintenance and retrieval of the information from the image-based electronic edition.  The utility plugins in the middleware layer provide functionality that is shared with the editing and administrative tools. The Project Explorer organizes current projects and completed editions, and provides a logical view of all project files regardless of their physical location. The Data Source Layer acts as a middle ground between the EPT editing tools and the project files and provides the physical location of the project files for the editing tools.  One utility plugin that the editor uses is the Keyboard, set by default to the Old English character set.  Like all the plugins, the Keyboard plugin is easily modified in configuration files to support character sets for other languages and other projects.

IV. The Editing Tools

The main tools originally developed under the Electronic Boethius project include a ScripText environment for integrating images and text; a Glossary Tool for building comprehensive glossaries from a XML text file; a Tagger for inserting XML markup, based on the images, in the text file; a DucType tool for paleographical description, analysis, and encoding; and an OverLay tool for comparing and encoding multiple images of a folio taken under different lighting conditions. Although first developed in the Electronic Editions Environment (E3), the tools held a somewhat precarious existence as independent, stand-alone programs. Under ARCHway the  reprogrammed tools now form the basic editing toolkit for the EPT. We are continually developing new tools under both projects and integrating them into the EPT. An editor can organize these tools in any desirbable combination, called ÒperspectivesÓ (see figure 3), to suit different editing and administrative tasks.

Fig. 3 - OverLay perspective with xTagger and Statistics tools

The editor can save perspectives and navigate between different perspectives, to perform any number of editorial and administrative tasks in the same project (see figure 4).

Fig. 4 Ð ImagText perspective with  xTagger and Glossary tools

V. Using the EPT

To keep the EPT as adaptable, extensible, and interoperable as possible, ARCHway is using in its testbed three important manuscripts from the British Library, Beowulf,  Alfred the GreatÕs Old English translation of BoethiusÕs Consolation of Philosophy, and AElfricÕs Lives of Saints. One by one and as a group these fire-damaged manuscripts[8]  present editors with widely different editing problems and computer scientists with equally challenging technical problems. The Electronic Beowulf at once serves as our guide for a fully funtioning image-based electronic edition (see figure 5) and as a Òlegacy documentÓ we expect the EPT to transform in the course of time. 

File written by Adobe Photoshop¨ 5.0

Fig. 5 - Electronic Beowulf with image, text, glossary, textual note, and search facility

While we are creating the EPT for editing Old English manuscripts, the ARCHway project has as its longterm goal the general purpose of contributing ideas and practical solutions for preserving and propagating any hand-written materials from the vast and varied heritage of world culture. We believe that ARCHwayÕs EPT architecture for building image-based electronic editions is an effective model for achieving these ends. 



[1] ARCHway is supported by the National Science Foundation under Grant No. 0219924, awarded pursuant to the authority of the NSF Act of 1950 (42 U.S.C. 1861 et seq.), which is subject to GC-1 Grant General Conditions (10/98) and is made in accordance with the provisions of NSF 98-63, ÒInformation Technology Research.Ó

[2] For details of these projects, follow the links at http://www.rch.uky.edu.

[3] The Electronic Boethius Project is funded by a Collaborative Research Award from the National Endowment for the Humanities and the Andrew W. Mellon Foundation, with images provided by The British Library and the Bodleian Library, Oxford.

[4] Chengdong Li was responsible for programming most of the tools that manipulate images.  The Electronic Boethius Project is funded by a Collaborative Research Award from the National Endowment for the Humanities. It received additional support for the development of these editing tools from the Andrew W. Mellon Foundation.  The British Library and the Bodleian Library, Oxford, provided the digital images for this project.

[5] For more information, see Eclipse Platform Technical Overview, Object Technology International, Inc., February 2003 (http://www.eclipse.org/whitepapers/eclipse-overview.pdf)

[6] For a more detailed description of the software, and the ARCHway project in general, see Kiernan et al. ÒThe ARCHway Project: Architecture for research in computing for humanities through research, teaching, and learning,Ó forthcoming in Literary and Linguistic Computing.

[7] For details about the open-source Concurrent Versioning System, visit the homepage at http://www.cvshome.org.

[8] Andrew Prescott. 'Their Present Miserable State of Cremation': the Restoration of the Cotton Library. Sir Robert Cotton as Collector: Essays on an Early Stuart Courtier and His Legacy. C. J. Wright, ed. London: British Library Publications, 1997. 391-454.