Video Presentations

DATeCH2021

Welcome and Keynote Speaker session

Frieda Steurs, INT
Organisers' welcome
Reinhard Altenhöner, SBB
The next exercise for libraries: data enrichment and analysis as a key technology for new tasks and offerings

Session 1. Evaluation and improvement of OCR

Matthias Boenig, Konstantin Baierer, Volker Hartmann, Maria Federbusch and Clemens Neudecker
Labelling OCR Ground Truth for Usage in Repositories
Anna-Maria Sichani, Panagiotis Kaddas, George K. Mikros and Basilis Gatos
OCR for Greek polytonic (multi accent) historical printed documents: development, optimization and quality control
Tobias Englmeier, Florian Fink and Klaus Schulz
A-I-PoCoTo - Combining Automated and Interactive Postcorrection of OCR results

Session 2. Applications

Emad Mohamed and Zeeshan Ali Sayyed
Arabic-SOS: Segmentation, Stemming, and Orthography Standardization for Classical and pre-Modern Standard Arabic
Christian Reul, Sebastian Göttel, Uwe Springmann, Christoph Wick, Kay-Michael Würzner and Frank Puppe
Automatic Semantic Text Tagging on Historical Lexica by Combining OCR and Typography Classification
Juri Opitz, Leo Born, Vivi Nastase and Yannick Pultar
Automatic Reconstruction of Emperor Itineraries from the Regesta Imperii
Karin Hofmeester, Ashkan Ashkpour, Katrien Depuydt and Jesse de Does
Diamonds in Borneo: Commodities as Concepts in Context

Session 3. OCR and HTR in practise

Clemens Neudecker, Konstantin Baierer, Maria Federbusch, Kay-Michael Würzner, Matthias Boenig, Elisa Hermann and Volker Hartmann
OCR-D: An end-to-end open-source OCR framework for historical documents
Kimmo Kettunen, Teemu Ruokolainen, Erno Liukkonen, Pierrick Tranouez, Daniel Antelme and Thierry Paquet.
Detecting Articles in a Digitized Finnish Historical Newspaper Collection 1771–1929: Early Results Using the PIVAJ Software
Christian Clausner, Apostolos Antonacopoulos, Christy Henshaw and Justin Hayes
Towards the Extraction of Statistical Information from Digitised Numerical Tables - The Medical Officer of Health Reports Scoping Study
Arnau Baró, Jialuo Chen, Alicia Fornés and Beáta Megyesi
Towards a generic unsupervised method for transcription of encoded manuscripts

Session 4. Digitisation of historical languages

Thomas Milo and Alicia González Martínez
A New Strategy for Arabic OCR: Archigraphemes, Letter Blocks, Script Grammar, and shape synthesis
Senka Drobac, Pekka Kauppinen and Krister Lindén
Improving OCR of historical newspapers and journals published in Finland

Session 5. Access to data

Anne Gorter, Edwin Klijn, Rutger Van Koert, Marielle Scherer and Ismee Tames
Tribunal Archives as Digital Research Facility (TRIADO): new ways to make archives accessible and useable
Tom Derrick and Nora McGregor
Cross-disciplinary collaborations to enrich access to non-Western language material in the Cultural Heritage sector
Georg Rehm, Martin Lee, Julián Moreno Schneider and Peter Bourgonje.
Curation Technologies for a Cultural Heritage Archive: Analysing and transforming a heterogeneous data set into an interactive curation workbench
Evagelos Varthis, Marios Poulos, Ilias Yarenis and Sozon Papavlasopoulos
Implementation of a Databaseless Web REST API for the Unstructured Texts of Migne's Patrologia Graeca with Searching capabilities and additional Semantic and Syntactic expandability

Session 6. Natural language processing

Jeremi Ochab and Holger Essler
Stylometry of literary papyri
Sandra Young
Using lexicography to characterise relations between species mentions in the biodiversity literature

Session 7. Metadata

Péter Király
Validating 126 million MARC records
Katrien Depuydt and Hennie Brugman
Turning Digitised Material into a Diachronic Corpus: Metadata Challenges in the Nederlab Project