Tuesday, June 23, 2009

DH09: Three presentations on Reading Ancient, Medieval and Modern Documents

1) Towards an Interpretation Support System for Reading Ancient Documents (missed part of this presentation, see abstracts). Concepts and web sites mentioned:

ISS prototype
Evidnce based decision process
EpiDoc
Contextual coding markup
index searcher (Ajax Live Search)
RESTful Web Service
Vindolanda Knowledge Base Web Service (Vindolanda Tablets)
eSAD Project

2) 'Image as Markup: adding Semantics to Manuscript Images', presented by Hugh Cayless, New York University.

img2xml Project (began June 1, 2009) uses SVG tracing of text on a manuscript page as a basis for linking page images to transcriptions, annotations, etc. The goal is to produce a web publishing environment that uses an open source, open standard stack to integrate the display of text/annotation/image. Test case is a 19th century journal, prose of a student James Dusedery(sic) from the 1840's. Contrasted raster, zoomed raster and vector imaging. The vector version is 'a shape with a black fill' rather than the pixelization of raster images. In the SVG, 'words' consists of combinations of shapes. The realtionship between shapes is based entirely on their position. To align the image with the transcription, we must make the SVG explicit. "OCHO" could be done, using groups () but is a non-starter due to overlapping handwritten words (descenders -eg. letter 'f' descending into the word below it). Used Drucker and McGann (Images as the Text: Pictographs and Pictographic Logic). In descrbing an entity, image-->abstract entity(concept)-->lingusitic entity(word). Not only symbols on page but structure as well. e.g. articulating the relationships among entities is important too.

Conclusions: the act of reading rescues us from attending to the presentational logic of the text (Drucker and McGann), you can do many things with an XML sketch of a text, machine actionable language to run this (RDF perhaps?), add ids to the SVG of annotations with TEI, you can create URLS to any text point. img2xml: http://github.com/hcayless/img2xml/tree/master

Interesting web site for transforming bitmaps into vector graphics: Potrace

3) The third presentation was titled 'Computer-aided Palaeography: Present and Future' by Peter Stokes, Cambridge University. Palaeolography defined as 'the study of medieval ancient handwriting. ' Problems with Palaeo: manuscripts are dificult to read; multiple authors on the same page written over time. Paleaographers offer subjective opinions rather than objective analysis. Truth...'depends on the authority of the palaeographer and the faith of the reader.' We must replace the qualitative data provided by the palaeographer with with quantitative ones. Were these two writings written by the same author? Instead ask, 'is it even possible to objectively decide whether these two pieces of writing were written by the sam person?' There is some objective basis that this process can be quanitified. Develop fully automatic systems can identify accurately 95% of the time (Srihari 2002, 2008) . Computational Palaeography. See: The Practice of Handwriting Identification, The Library (2007), p266, n 27. What can we do with this material? Requirements: 'cross-examinable' including interpretable, reproducible, communicable, and allow variation and flexibility in analyzing handwriting. Software: The 'Hand Analyser': both data and the process is recorded, data and processes can be shared (Java), and the system will never be 'finished,' but extensible with plugins.

No comments: