There is an increasing concern about the optimisation of the available resources for the creation, transformation, and dissemination of digitised textual content. The digital preservation of the cultural heritage calls for the interdisciplinary collaboration of researchers and practitioners in order to foster innovative approaches for the digitisation and exploitation of cultural content. The DATeCH (Digital Access to Textual Cultural Heritage) conference is conceived as forum to present, discuss and showcase techniques which lead potentially to:

  • Improved text recognition and special text recognition techniques for historical documents.

  • Innovative views and tools for the exploitation of digital content by both experts and non-expert communities in the humanities.

  • Advanced tools for a higher productivity and quality in the creation of useful digital content.

  • New technologies to classify historical texts and sort large collections according to genre, language, time and place of birth, text recognition quality, thematic fields, author, etc.

  • Improved treatment of historical languages (diachronic language development) and multilingualism.

  • Advanced toolkits for querying and exploiting (clean or noisy) historical corpora and other linguistic resources.

  • New mining techniques on historical text collections (addressing, e.g., historical text re-use, or person and event detection).

  • Procedures of enriching, structuring, annotating and interlinking historical texts and reference data.


We welcome submissions on the following topics:

  • Optical Character Recognition (OCR) and / or Handwritten Text Recognition (HTR) technology and tools for minority and historical languages, including dialects

  • Methods and tools for post-correction of OCR and / or HTR results

  • Document layout analysis, document understanding

  • Automated quality control for mass OCR and / or HTR data

  • Innovative access methods for historical texts and corpora

  • Natural language processing of ancient languages (e.g. Latin, Greek, Arabic, Coptic …)

  • Visualisation techniques and interfaces for search and research in digital humanities

  • Publication and retrieval on e-books and mobile devices

  • Crowdsourcing techniques for collecting and annotating data in digital humanities

  • Enrichment of and metadata production for historical texts and corpora

  • Data created with mobile devices

  • Data presentation and exploration on mobile devices

  • Ontological and linked data based contextualisation of digitised and born-digital scholarly data resources

Target Audience

The conference aims to foster interdisciplinary work and the linking together of participants engaged in the following areas:

  • Text digitisation, Optical Character Recognition (OCR) and Handwritten Text Recognition (HTR)

  • Digital Humanities, Digital Cultural Heritage

  • Image and document analysis

  • Digital libraries, Library and Archival science

  • Museum and Heritage Studies

  • Applied computational linguistics

  • Crowdsourcing

  • Interfaces and human-computer interaction