Thursday, June 25, 2009

DH09: The Digital Classicist presentations: on reusing data

The Digital Classicist: Re-use of Open Source and Open Access Publications in Ancient Studies'. Three presentations:

1) 'Linking and Querying Ancient Texts (LaQvAT)' by Gabirel Bodard et al. (digitalclassicist.org/wip/seminar.xml) makes raw data available to be reused. OGSA-DAI is concerned with sharing structured data. OGSA-DQP-- distributed query processing.

LaQvAT data resources: HGV: Filemaker Pro; German-use views to translate them; Projet Volterra: Access database with Perl script based publication; mainly text-based searches and IAph XML databases: XML data source in EpiDoc--overlap in time with Volterra. http://laquat.cerch.kcl.ac.uk/

2) 'Data and Code for Ancient Geography' by Thomas Elliott. Project called 'Pleiades.' Open source software is the backbone of Pleiades; Pleiades wouldn't exist without it. Demos of the Project. Live Twitter feeds appeared on screen as demos presented. Data from Barrington Atlas. http://atlantides.org/trac/pleiades/wiki/PleiadesSoftware http://openlayers.org, www.plone.org trac.gispython.org/lab/wiki

e.g. Termassos in the Pleiades Beta Portal. Use Google Maps.
License: http://atlantides.org/trac/pleiades/wiki/PleiadesContributorAgreement.

3) 'Recent work at the Center for Hellenic Studies (CHS): reuse of digial resources' by Neel Smtih of College of the Holy Cross. Three projects: Homer Multitext Project, First Thousand Years of Greek Project(FTYG), and the infrastructure for other projects (CITE architecture). The mission of CHS is to share and expand knowledge of classical studies. Reuse is the mandate.

Licenses are important; proprietary licenses like TLG complicate reuse of other material (poorly written licenses complicate things). Recognized open licenses simplify collaboration: multitext expereince with Marciana (sight visited in Italy that had Homer texts).

Data: in well-known and/or well-specified formats; data models; data structures; identifiers. Data in familiar formats in FTYG. the Project used TLG word list, Perseus' morphological parser, TEI-compliant LSJ from Perseus, and analytical indices. Sharing data models: rich markup bibiliofraphical model of versions in hierarchical relation (bibliographers) Textual model in CTS: combines in: four structural properties of text, defines functional equivalence of different implementations; realized in two distinct data models (XML texts and TextInventory and tabular strcuture of nodes).

Identifiers in FTYG: inventory of extant Greek poetry known from MS transmission: where possible, reuses identifiers from TLG canon (not always possible); inventory of Greek lexical entities: where possible, reuse identifiers from Perseus edition of LSJ.

Shared software: source code, running services with specified APIs, and [something else]
Source code can be modified: tlgu, Epidoc transcoding transformer and Diogenes software. Reusable modules and code libraries (Epidoc transformer is useful (instead of dumping data on the web, this app can be downloaded to use with the data), CTS URN manipulation with CTS library).

Texts and specifications: simple XML structure encodes request; expected replies encoded, etc.

CTS (Canonical Text Service)

No comments: