Satellite workshops

Tuesday, May 30th

8:30: Registration

10:00 – 16:00 The journey from physical to digital and advancements in culture heritage digitisation

9:00 – 18:00 TRACER tutorial for computational text reuse detection

13:00 – 17:00 TextGrid user workshop

18:15 – 19:30: GCDH Evening Lectures: Sara Tonelli (FBK-Trento): “NLP for Historical Content Analysis: Ongoing work and Open challenges” Link:

Wednesday, May 31st

8:30: Registration

9:00 – 16:30 Handwritten Text Recognition – Transkribus Workshop (project READ)

13:00 – 17:00 PoCoTo user workshop

9:00 – 17:00 IMPACT Members Meeting (only for IMPACT members)

Main conference

Thursday, June 1st

8:30: Registration

9:00 – 9:15: Conference Opening

9:15 – 10:45: Session 1. Transcription

(Chaired by Sinai Rusinek)

    • Jesper Zedlitz and Norbert Luttenberger. 750 Volunteers Transcribing 31,000 Pages with 8.5 million Entries Online – an Evaluation

    • Enrique Manjavacas and Peter Petre. Enabling Annotation of Historical Corpora in an Asynchronous Collaborative Environment

    • Manuel Burghardt and Sebastian Spanner. Allegro: User-centered Design of a Tool for the Crowdsourced Transcription of Handwritten Music Scores

    • Jesper Zedlitz and Norbert Luttenberger. Enhancing Human-Transcribed Records by Using OCR

10:45 – 11:15: Coffee break

11:15 – 13:05 Session 2. Natural Language Processing

(Chaired by Klaus Schulz)

    • Filip Graliński, Rafał Jaworski, Łukasz Borchmann and Piotr Wierzchoń. The RetroC challenge: how to guess the publication year of a text?

    • Catalina Maranduc, Cătălin Mititelu and Radu Simionescu. Parsing Romanian Specialized Dictionaries Structured in Nests

    • Markus Paluch, Gabriela Rotari, David Steding, Maximilian Weß, Maria Moritz and Marco Büchler. Analysis of part-of-speech tagging of historical German texts

    • Alessio Salomoni. Dependency Parsing on Late-18th-Century German Aesthetic Writings. A Preliminary Inquiry into Schiller and F. Schlegel.

    • Candela Gustavo, Maria Pilar Escobar Esteban and Borja Navarro-Colorado. In search of Poetic Rhythm: Poetry retrieval trough text and metre

13:05 – 13:15: DARIAH presentations

    • Mike Mertens. Dariah-EU.

    • Stefan Schmunck. Dariah-DE.

13:15 – 14:00: Lunch break

14:00 – 15:30: Session 3. OCR and Postprocessing

(Chaired by Neil Fitzgerald)

    • Florian Fink, Klaus U. Schulz and Uwe Springmann. Profiling of OCR’ed Historical Texts Revisited

    • Alicia González Martínez, Tillmann Feige and Thomas Eich. Clear-cut methodology for Arabic OCR and post-correction with low technical skilled annotators

    • Harald Hammarström, Shafqat Virk and Markus Forsberg. Poor Man’s OCR Post-Correction: Unsupervised Recognition of Variant Spelling Applied to a Multilingual Document Collection

    • Manuel Ayuso. OCR of a mixed corpus: early printings and manuscripts of Martianus Capella’s work

15:30 – 16:00: Coffee break

16:00 – 17:30 Session 4. Natural Language Processing on Latin and Greek

(Chaired by Greta Franzini)

    • Marco Budassi and Marco Passarotti. The Impact of Unassimilated Loanwords on Latin Lexicon. A Qualitative and Quantitative Analysis

    • Corien Bary, Peter Berck and Iris Hendrickx. A Memory-Based Lemmatizer for Ancient Greek

    • Herbert Lange. Implementation of a Latin Grammar in Grammatical Framework

    • Eleonora Litta, Marco Passarotti and Paolo Ruffolo. Node Formation. Using Networks to Inspect Productivity in Affixal Derivation in Classical Latin

17:30 – 18:15 Poster session

    • B1: C. Clausner, C. Papadopoulos, S. Pletschacher, A. Antonacopoulos. The ENP Image and Ground Truth Dataset of Historical Newspapers

    • B2: C. Papadopoulos, S. Pletschacher, C. Clausner, A. Antonacopoulos. The IMPACT Dataset of Historical Document Images

    • B3: Cătălina Mărănduc, Augusto Perez and Victoria Bobicev. Building a Corpus to Study the Historical and Geographical Variation of Romanian Language

    • B4: So Miyagawa, Kirill Bulert and Marco Büchler. Utilization of Common OCR Tools for Typeset Coptic Texts

    • B5: Markus Paluch, Franz Mertins, Simone Rebora, Gabriela Rotari, Christina Schmidt, Benedict Spermoser, Ronald Weller, Maximilian Weß and J. Berenike Herrmann. – Building a Corpus of Modernist Literary Texts

    • B6: Emily Franzini, Greta Franzini, Gabriela Rotari, Franziska Pannach, Mahdi Solhdoust, Marco Büchler. The digital breadcrumb trail of Brothers Grimm

    • B7: Jim Salmons and Timlynn Babitsky. The MAGAZINE #GTS format, an integrated document structure and content depiction model supporting eResearch and machine-learning at the Internet Archive

    • B8: Karen Thöle. Digital means for the presentation and evaluation of a 15th century liturgical book

    • B9: Maria Moritz, Marco Büchler. Non-Literal Text Reuse in Historical Texts: An Approach to Identify Reuse Transformations and its Application to Bible Reuse

    • B10: C. Clausner, S. Pletschacher, A. Antonacopoulos: Efficient OCR Training Data Generation with Aletheia

    • B11: Christian Clausner: Overview on a number of document analysis tools, ranging from ground truth production to performance evaluation.

19:00 Dinner

Friday, June 2nd

8:30: Registration

9:00 – 10:30 Session 5. Infrastructure and Linked Open Data

(Chaired by Tomasz Parkola)

    • Péter Király. Towards an extensible measurement of metadata quality

    • Christophe Onambélé, Matyáš Kopp, Marco Passarotti and Jiří Mírovský. Converting Latin Treebank Data into SQL Database for Query Purposes

    • Thierry Declerck and Lisa Schäfer. Porting past classification schemes for narratives to a Linked Data Framework

    • Simone Rebora. A Software Pipeline for the Reception of Italian Literature in Nineteenth-Century England. Preliminary Testing

10:30 – 10:45 Best Paper Award Ceromony

10:45 – 11:15 Coffee break

11:15 – 12:45 Session 6. Digitisation & Layout Analysis

(Chaired by Apostolos Antonacopoulos)

    • Christian Reul, Uwe Springmann and Frank Puppe. LAREX – A semi-automatic open-source Tool for Layout Analysis and Region Extraction on Early Printed Books

    • Svetlana Cojocaru, Malahov Ludmila and Alexandru Colesnicov. Digitization of Old Romanian Texts Printed in the Cyrillic Script

    • Christian Clausner, Justin Hayes, Apostolos Antonacopoulos and Stefan Pletschacher. Unearthing the Recent Past: Digitising and Understanding Statistical Information from Census Tables

    • Christian Reul, Marco Dittrich and Martin Gruner. Case Study of a highly automated Layout Analysis and OCR of an incunabulum: ‘Der Heiligen Leben’ (1488)

12:45 – 13:30 Lunch break

13:30 – 15:00 Session 7. Spatial Analysis

(Chaired by Marco Büchler)

    • Kimmo Kettunen and Teemu Ruokolainen. Names, Right or Wrong: Named Entities in an OCRed Historical Finnish Newspaper Collection

    • Rebecca Benefiel, Sara Sprenkle, Holly Sypniewski and Jamie White. Ancient Graffiti Project: Geo-Spatial Visualization and Search Tools for Ancient Handwritten Inscriptions

    • Gustavo Candela, Maria Pilar Escobar Esteban and Manuel Marco-Such. Semantic Enrichment on Cultural Heritage collections: A case study using geographic information

    • Mariona Coll Ardanuy and Caroline Sporleder. Weakly-supervised toponym disambiguation in historical documents using semantic and geographic features

15:00 – 15:30: Coffee break

15:30 – 16:30 Final Panel