Schedule

DATeCH2017

Satellite workshops

Tuesday, May 30th

8:30: Registration

10:00 – 16:00 The journey from physical to digital and advancements in culture heritage digitisation

9:00 – 18:00 TRACER tutorial for computational text reuse detection

13:00 – 17:00 TextGrid user workshop

18:15 – 19:30: GCDH Evening Lectures: Sara Tonelli (FBK-Trento): “NLP for Historical Content Analysis: Ongoing work and Open challenges” Link: http://www.gcdh.de/en/events/calendar-view/gcdh-evening-lectures-sara-tonelli-turin-nlp-historical-content-analysis-ongoing-work-and-open-challenges/

Wednesday, May 31st

8:30: Registration

9:00 – 16:30 Handwritten Text Recognition – Transkribus Workshop (project READ)

13:00 – 17:00 PoCoTo user workshop

9:00 – 17:00 IMPACT Members Meeting (only for IMPACT members)

Main conference

Thursday, June 1st

8:30: Registration

9:00 – 9:15: Conference Opening

9:15 – 10:45: Session 1. Transcription

(Chaired by Sinai Rusinek)

- Jesper Zedlitz and Norbert Luttenberger. 750 Volunteers Transcribing 31,000 Pages with 8.5 million Entries Online – an Evaluation
- Enrique Manjavacas and Peter Petre. Enabling Annotation of Historical Corpora in an Asynchronous Collaborative Environment
- Manuel Burghardt and Sebastian Spanner. Allegro: User-centered Design of a Tool for the Crowdsourced Transcription of Handwritten Music Scores
- Jesper Zedlitz and Norbert Luttenberger. Enhancing Human-Transcribed Records by Using OCR

10:45 – 11:15: Coffee break

11:15 – 13:05 Session 2. Natural Language Processing

(Chaired by Klaus Schulz)

- Filip Graliński, Rafał Jaworski, Łukasz Borchmann and Piotr Wierzchoń. The RetroC challenge: how to guess the publication year of a text?
- Catalina Maranduc, Cătălin Mititelu and Radu Simionescu. Parsing Romanian Specialized Dictionaries Structured in Nests
- Markus Paluch, Gabriela Rotari, David Steding, Maximilian Weß, Maria Moritz and Marco Büchler. Analysis of part-of-speech tagging of historical German texts
- Alessio Salomoni. Dependency Parsing on Late-18th-Century German Aesthetic Writings. A Preliminary Inquiry into Schiller and F. Schlegel.
- Candela Gustavo, Maria Pilar Escobar Esteban and Borja Navarro-Colorado. In search of Poetic Rhythm: Poetry retrieval trough text and metre

13:05 – 13:15: DARIAH presentations

- Mike Mertens. Dariah-EU.
- Stefan Schmunck. Dariah-DE.

13:15 – 14:00: Lunch break

14:00 – 15:30: Session 3. OCR and Postprocessing

(Chaired by Neil Fitzgerald)

- Florian Fink, Klaus U. Schulz and Uwe Springmann. Profiling of OCR’ed Historical Texts Revisited
- Alicia González Martínez, Tillmann Feige and Thomas Eich. Clear-cut methodology for Arabic OCR and post-correction with low technical skilled annotators
- Harald Hammarström, Shafqat Virk and Markus Forsberg. Poor Man’s OCR Post-Correction: Unsupervised Recognition of Variant Spelling Applied to a Multilingual Document Collection
- Manuel Ayuso. OCR of a mixed corpus: early printings and manuscripts of Martianus Capella’s work

15:30 – 16:00: Coffee break

16:00 – 17:30 Session 4. Natural Language Processing on Latin and Greek

(Chaired by Greta Franzini)

- Marco Budassi and Marco Passarotti. The Impact of Unassimilated Loanwords on Latin Lexicon. A Qualitative and Quantitative Analysis
- Corien Bary, Peter Berck and Iris Hendrickx. A Memory-Based Lemmatizer for Ancient Greek
- Herbert Lange. Implementation of a Latin Grammar in Grammatical Framework
- Eleonora Litta, Marco Passarotti and Paolo Ruffolo. Node Formation. Using Networks to Inspect Productivity in Affixal Derivation in Classical Latin

17:30 – 18:15 Poster session

- B1: C. Clausner, C. Papadopoulos, S. Pletschacher, A. Antonacopoulos. The ENP Image and Ground Truth Dataset of Historical Newspapers
- B2: C. Papadopoulos, S. Pletschacher, C. Clausner, A. Antonacopoulos. The IMPACT Dataset of Historical Document Images
- B3: Cătălina Mărănduc, Augusto Perez and Victoria Bobicev. Building a Corpus to Study the Historical and Geographical Variation of Romanian Language
- B4: So Miyagawa, Kirill Bulert and Marco Büchler. Utilization of Common OCR Tools for Typeset Coptic Texts
- B5: Markus Paluch, Franz Mertins, Simone Rebora, Gabriela Rotari, Christina Schmidt, Benedict Spermoser, Ronald Weller, Maximilian Weß and J. Berenike Herrmann. https://kolimo.uni-goettingen.de – Building a Corpus of Modernist Literary Texts
- B6: Emily Franzini, Greta Franzini, Gabriela Rotari, Franziska Pannach, Mahdi Solhdoust, Marco Büchler. The digital breadcrumb trail of Brothers Grimm
- B7: Jim Salmons and Timlynn Babitsky. The MAGAZINE #GTS format, an integrated document structure and content depiction model supporting eResearch and machine-learning at the Internet Archive
- B8: Karen Thöle. Digital means for the presentation and evaluation of a 15th century liturgical book
- B9: Maria Moritz, Marco Büchler. Non-Literal Text Reuse in Historical Texts: An Approach to Identify Reuse Transformations and its Application to Bible Reuse
- B10: C. Clausner, S. Pletschacher, A. Antonacopoulos: Efficient OCR Training Data Generation with Aletheia
- B11: Christian Clausner: Overview on a number of document analysis tools, ranging from ground truth production to performance evaluation.

19:00 Dinner

Friday, June 2nd

8:30: Registration

9:00 – 10:30 Session 5. Infrastructure and Linked Open Data

(Chaired by Tomasz Parkola)

- Péter Király. Towards an extensible measurement of metadata quality
- Christophe Onambélé, Matyáš Kopp, Marco Passarotti and Jiří Mírovský. Converting Latin Treebank Data into SQL Database for Query Purposes
- Thierry Declerck and Lisa Schäfer. Porting past classification schemes for narratives to a Linked Data Framework
- Simone Rebora. A Software Pipeline for the Reception of Italian Literature in Nineteenth-Century England. Preliminary Testing

10:30 – 10:45 Best Paper Award Ceromony

10:45 – 11:15 Coffee break

11:15 – 12:45 Session 6. Digitisation & Layout Analysis

(Chaired by Apostolos Antonacopoulos)

- Christian Reul, Uwe Springmann and Frank Puppe. LAREX – A semi-automatic open-source Tool for Layout Analysis and Region Extraction on Early Printed Books
- Svetlana Cojocaru, Malahov Ludmila and Alexandru Colesnicov. Digitization of Old Romanian Texts Printed in the Cyrillic Script
- Christian Clausner, Justin Hayes, Apostolos Antonacopoulos and Stefan Pletschacher. Unearthing the Recent Past: Digitising and Understanding Statistical Information from Census Tables
- Christian Reul, Marco Dittrich and Martin Gruner. Case Study of a highly automated Layout Analysis and OCR of an incunabulum: ‘Der Heiligen Leben’ (1488)

12:45 – 13:30 Lunch break

13:30 – 15:00 Session 7. Spatial Analysis

(Chaired by Marco Büchler)

- Kimmo Kettunen and Teemu Ruokolainen. Names, Right or Wrong: Named Entities in an OCRed Historical Finnish Newspaper Collection
- Rebecca Benefiel, Sara Sprenkle, Holly Sypniewski and Jamie White. Ancient Graffiti Project: Geo-Spatial Visualization and Search Tools for Ancient Handwritten Inscriptions
- Gustavo Candela, Maria Pilar Escobar Esteban and Manuel Marco-Such. Semantic Enrichment on Cultural Heritage collections: A case study using geographic information
- Mariona Coll Ardanuy and Caroline Sporleder. Weakly-supervised toponym disambiguation in historical documents using semantic and geographic features

15:00 – 15:30: Coffee break

15:30 – 16:30 Final Panel