Accepted papers

DATeCH 2017

DATeCH2017 proceedings are available at:

Oral presentation and inclusion in conference proceedings

  • Marco Budassi and Marco Passarotti. The Impact of Unassimilated Loanwords on Latin Lexicon. A Qualitative and Quantitative Analysis

  • Kimmo Kettunen and Teemu Ruokolainen. Names, Right or Wrong: Named Entities in an OCRed Historical Finnish Newspaper Collection

  • Candela Gustavo, Maria Pilar Escobar Esteban and Borja Navarro-Colorado. In search of Poetic Rhythm: Poetry retrieval trough text and metre

  • Jesper Zedlitz and Norbert Luttenberger. 750 Volunteers Transcribing 31,000 Pages with 8.5 million Entries Online – an Evaluation

  • Christophe Onambélé, Matyáš Kopp, Marco Passarotti and Jiří Mírovský. Converting Latin Treebank Data into SQL Database for Query Purposes

  • Catalina Maranduc, Cătălin Mititelu and Radu Simionescu. Parsing Romanian Specialized Dictionaries Structured in Nests

  • Enrique Manjavacas and Peter Petre. Enabling Annotation of Historical Corpora in an Asynchronous Collaborative Environment

  • Candela Gustavo, Maria Pilar Escobar Esteban and Manuel Marco-Such. Semantic Enrichment on Cultural Heritage collections: A case study using geographic information

  • Alessio Salomoni. Dependency Parsing on Late-18th-Century German Aesthetic Writings. A Preliminary Inquiry into Schiller and F. Schlegel.

  • Eleonora Litta, Marco Passarotti and Paolo Ruffolo. Node Formation. Using Networks to Inspect Productivity in Affixal Derivation in Classical Latin

  • Alexandru Colesnicov, Malahov Ludmila and Svetlana Cojocaru. Digitization of Old Romanian Texts Printed in the Cyrillic Script

  • Jesper Zedlitz and Norbert Luttenberger. Enhancing Human-Transcribed Records by Using OCR

  • Filip Graliński, Rafał Jaworski, Łukasz Borchmann and Piotr Wierzchoń. The RetroC challenge: how to guess the publication year of a text?

  • Florian Fink, Klaus U. Schulz and Uwe Springmann. Profiling of OCR’ed Historical Texts Revisited

  • Christian Reul, Uwe Springmann and Frank Puppe. LAREX – A semi-automatic open-source Tool for Layout Analysis and Region Extraction on Early Printed Books

  • Christian Reul, Marco Dittrich and Martin Gruner. Case Study of a highly automated Layout Analysis and OCR of an incunabulum: ‘Der Heiligen Leben’ (1488)

  • Mariona Coll Ardanuy and Caroline Sporleder. Weakly-supervised toponym disambiguation in historical documents using semantic and geographic features

  • Corien Bary, Peter Berck and Iris Hendrickx. A Memory-Based Lemmatizer for Ancient Greek

  • Manuel Burghardt and Sebastian Spanner. Allegro: User-centered Design of a Tool for the Crowdsourced Transcription of Handwritten Music Scores

  • Simone Rebora. A Software Pipeline for the Reception of Italian Literature in Nineteenth-Century England. Preliminary Testing

  • Alicia González Martínez, Tillmann Feige and Thomas Eich. Clear-cut methodology for Arabic OCR and post-correction with low technical skilled annotators

  • Holly Sypniewski, Rebecca Benefiel, Sara Sprenkle and Jamie White. Ancient Graffiti Project: Geo-Spatial Visualization and Search Tools for Ancient Handwritten Inscriptions

  • Thierry Declerck and Lisa Schäfer. Porting past classification schemes for narratives to a Linked Data Framework

  • Christian Clausner, Justin Hayes, Apostolos Antonacopoulos and Stefan Pletschacher. Unearthing the Recent Past: Digitising and Understanding Statistical Information from Census Tables

  • Harald Hammarström, Shafqat Virk and Markus Forsberg. Poor Man’s OCR Post-Correction: Unsupervised Recognition of Variant Spelling Applied to a Multilingual Document Collection

  • Herbert Lange. Implementation of a Latin Grammar in Grammatical Framework

  • Péter Király. Towards an extensible measurement of metadata quality

  • Manuel Ayuso. OCR of a mixed corpus: early printings and manuscripts of Martianus Capella’s work

  • Markus Paluch, Gabriela Rotari, David Steding, Maximilian Weß, Maria Moritz and Marco Büchler. Analysis of part-of-speech tagging of historical German texts


    • B1: C. Clausner, C. Papadopoulos, S. Pletschacher, A. Antonacopoulos. The ENP Image and Ground Truth Dataset of Historical Newspapers

    • B2: C. Papadopoulos, S. Pletschacher, C. Clausner, A. Antonacopoulos. The IMPACT Dataset of Historical Document Images

    • B3: Cătălina Mărănduc, Augusto Perez and Victoria Bobicev. Building a Corpus to Study the Historical and Geographical Variation of Romanian Language

    • B4: So Miyagawa, Kirill Bulert and Marco Büchler. Utilization of Common OCR Tools for Typeset Coptic Texts

    • B5: Markus Paluch, Franz Mertins, Simone Rebora, Gabriela Rotari, Christina Schmidt, Benedict Spermoser, Ronald Weller, Maximilian Weß and J. Berenike Herrmann. – Building a Corpus of Modernist Literary Texts

    • B6: Emily Franzini, Greta Franzini, Gabriela Rotari, Franziska Pannach, Mahdi Solhdoust, Marco Büchler. The digital breadcrumb trail of Brothers Grimm

    • B7: Jim Salmons and Timlynn Babitsky. The MAGAZINE #GTS format, an integrated document structure and content depiction model supporting eResearch and machine-learning at the Internet Archive

    • B8: Karen Thöle. Digital means for the presentation and evaluation of a 15th century liturgical book

    • B9: Maria Moritz, Marco Büchler. Non-Literal Text Reuse in Historical Texts: An Approach to Identify Reuse Transformations and its Application to Bible Reuse

    • B10: C. Clausner, S. Pletschacher, A. Antonacopoulos: Efficient OCR Training Data Generation with Aletheia

    • B11: Christian Clausner: Overview on a number of document analysis tools, ranging from ground truth production to performance evaluation.