Conference program

Pre-Conference Workshop - 8th May

09:00 – 09:30 | Registration and coffee

09:30 – 10:00 | Welcome and Introductions, aims of workshop

10:00 – 11:00 | Workshop session 1: Getting access to the data

11:00 – 11:30 | Coffee

11:30 – 12:30 | Workshop session 2: Data Preparation

12:30 – 13:30 | Lunch

13:30 – 15:30 |Workshop session 3: Data Analysis – Part 1

15:30 – 16:00 | Coffee

16:00 – 18:00 |Workshop session 4: Data Analysis – Part 2

18:00 – 18:30 | Wrap-up

19:00 – 21:00 | Social Drinks @ Local Pub (self-paid)

Conference Day 1 - 9th May

08:30 – 09:00 | Registration&coffee

09:00 – 09:15 | Conference Opening: Organisers' welcome

09:15 – 10:00 | Keynote speaker

10:00 – 10:30 | Coffee break with exhibits

10:30 – 12:00 | Session 1 - Evaluation and improvement of OCR

  • Matthias Boenig, Konstantin Baierer, Volker Hartmann, Maria Federbusch and Clemens Neudecker. Labelling OCR Ground Truth for Usage in Repositories

  • Anna-Maria Sichani, Panagiotis Kaddas, George K. Mikros and Basilis Gatos. OCR for Greek polytonic (multi accent) historical printed documents: development, optimization and quality control

  • Hsiang-An Wang and Pin-Ting Liu. Towards a Higher Accuracy of Optical Character Recognition of Chinese Rare Books in Making Use of Text Model

  • Tobias Englmeier, Florian Fink and Klaus Schulz. A-I-PoCoTo - Combining Automated and Interactive Postcorrection of OCR results

12:00 – 13:00 | Lunch

13:00 – 14:30 | Session 2 - Applications

  • Emad Mohamed and Zeeshan Ali Sayyed. Arabic-SOS: Segmentation, Stemming, and Orthography Standardization for Classical and pre-Modern Standard Arabic

  • Christian Reul, Sebastian Göttel, Uwe Springmann, Christoph Wick, Kay-Michael Würzner and Frank Puppe. Automatic Semantic Text Tagging on Historical Lexica by Combining OCR and Typography Classification

  • Juri Opitz, Leo Born, Vivi Nastase and Yannick Pultar. Automatic Reconstruction of Emperor Itineraries from the Regesta Imperii

  • Karin Hofmeester, Ashkan Ashkpour, Katrien Depuydt and Jesse de Does. Diamonds in Borneo: Commodities as Concepts in Context

14:30 – 15:00 | Coffee break with exhibits

15:00 – 16:00 | Poster session (find more info below)

16:00 – 17:30 | Session 3 - OCR and HTR in practise

                • Clemens Neudecker, Konstantin Baierer, Maria Federbusch, Kay-Michael Würzner, Matthias Boenig, Elisa Hermann and Volker Hartmann. OCR-D: An end-to-end open-source OCR framework for historical documents

                • Kimmo Kettunen, Teemu Ruokolainen, Erno Liukkonen, Pierrick Tranouez, Daniel Antelme and Thierry Paquet. Detecting Articles in a Digitized Finnish Historical Newspaper Collection 1771–1929: Early Results Using the PIVAJ Software

                • Christian Clausner, Apostolos Antonacopoulos, Christy Henshaw and Justin Hayes. Towards the Extraction of Statistical Information from Digitised Numerical Tables - The Medical Officer of Health Reports Scoping Study

                • Arnau Baró, Jialuo Chen, Alicia Fornés and Beáta Megyesi. Towards a generic unsupervised method for transcription of encoded manuscripts

From 19:30 | Conference Dinner @La Manufacture (Rue Notre Dame du Sommeil 12 - 1000 Brussels

Conference Day 2 - 10th May

08:30 – 09:00 | Registration & coffee

09:00 – 10:30 | Session 4 - Digitisation of historical languages

  • Bruno Bon and Laura Vangone. Challenges of Mass OCR-isation of Medieval Latin Texts in a Resource-Limited Project

  • Eliese-Sophia Lincke, Marco Büchler and Kirill Bulert. Optical Character Recognition for Coptic. A multi-source approach for scholarly editions

  • Thomas Milo and Alicia González Martínez. A New Strategy for Arabic OCR: Archigraphemes, Letter Blocks, Script Grammar, and shape synthesis

  • Senka Drobac, Pekka Kauppinen and Krister Lindén. Improving OCR of historical newspapers and journals published in Finland

10:30 – 11:00 | Coffee break with exhibits

11:00 – 12:30 | Session 5 - Access to data

                • Anne Gorter, Edwin Klijn, Rutger Van Koert, Marielle Scherer and Ismee Tames. Tribunal Archives as Digital Research Facility (TRIADO): new ways to make archives accessible and useable

                • Tom Derrick and Nora McGregor. Cross-disciplinary collaborations to enrich access to non-Western language material in the Cultural Heritage sector

                • Georg Rehm, Martin Lee, Julián Moreno Schneider and Peter Bourgonje. Curation Technologies for a Cultural Heritage Archive: Analysing and transforming a heterogeneous data set into an interactive curation workbench

                • Evagelos Varthis, Marios Poulos, Ilias Yarenis and Sozon Papavlasopoulos. Implementation of a Databaseless Web REST API for the Unstructured Texts of Migne's Patrologia Graeca with Searching capabilities and additional Semantic and Syntactic expandability

12:30 – 13:30 | Lunch at exhibits

13:30 – 15:00 | Session 6 - Natural language processing

  • Helmut Schmid. Deep Learning-Based Morphological Taggers and Lemmatizers for Annotating Historical Texts

  • Jeremi Ochab and Holger Essler. Stylometry of literary papyri

  • Sandra Young. Using lexicography to characterise relations between species mentions in the biodiversity literature

  • Giuseppe Celano. Standoff Annotation for the Ancient Greek and Latin Dependency Treebank

15:00 – 15:30 | Coffee break with exhibits

15:30 – 16:45 | Session 7 - Metadata

  • Liviu Pop. Hidden Metadata in Plain Sights: Romanian Folklore Catalogues

  • Péter Király. Validating 126 million MARC records

  • Katrien Depuydt and Hennie Brugman. Turning Digitised Material into a Diachronic Corpus: Metadata Challenges in the Nederlab Project

16:45 – 17:00 | Best paper award ceremony

17:00 – 17:45 | Panel discussion

17:45 – 18:00 | Conference closing

Poster Session

  • Mª Isabel Rodríguez Fidalgo and Adriana Paíno Ambrosio, Diego A Burgos. «Omnium scientiarum princeps Salmantica docet»: An immersive 360º experience

  • Zdeněk Uhlíř, Olga Čiperová, Tomáš Klimek and Tomáš Psohlavec. The Fragmentation of the Contents of Historical Text Editions in the Manuscriptorium Digital Library Environment

  • Wouter Termont, Lorenz Demey and Hans Smessaert. First Steps Toward a Digital Database of Aristotelian Diagrams

  • Marieke Meelen and Christopher Handy. Intelligent Agents and Genetic Algorithms for Tibetan and Chinese Tagging and Alignment

  • Ben Companjen, Peter Verhaar, Koenraad Donker van Heel, Ferdinand Harmsen and Juan José Archidona Ramírez. Piloting the Abnormal Hieratic Global Portal

  • Vanessa Hannesschläger. “«Retro-editing»: The edition of an edition of the Karl Kraus legal papers"

  • Hadewijch Masure. Itinera Nova: an ambitious digitization and disclosure of the Leuven Bench of Aldermen archives

  • Catalina Maranduc, Victoria Bobicev and Roman Untilov. Syntactic Parser for Old and Regional Romanian

  • Roxanne Wyns and An Smets. International Image Interoperability Framework @ KU Leuven (Belgium). Current applications and future projects

  • Francesco Gelati. Selective Harvester: Harvesting and Managing Archival Descriptions as XML-EAD files

  • Soumya Mohanty and David Smith. Alignment-Based Training for Detecting Reader Annotations in Printed Books

  • Błażej Betański, Mateusz Matela, Maciej Mikuła and Tomasz Parkoła. Text collation in the dataset of the sources of the old law

  • Jim Salmons and Timlynn Babitsky. #MAGAZINEgts and #dhSegment: Using a Metamodel Subgraph to Generate Synthetic Data of Under-Sampled Complex Document Structures for Machine-Learning

  • Kimmo Kettunen, Mika Koistinen and Jukka Kervinen. Tidying up the Mess – on a Way to Improved Quality in a Historical Finnish Newspaper and Journal Collection 1771-1910J

  • Nathanael Philipp and Maximilian Bryan. Evaluation of CNN architectures for text detection in historical maps

  • Catalina Maranduc, Ludmila Malahov and Mihaela Marin. Alignment of the Romanian Oldest New Testament

  • Shu Jiun Chen. Semantic Enrichment of Linked Biography Data for Digital Humanities

  • Sinai Rusinek and Nurit Greidinger. No Tabula Rasa: Digitizing Historical Newspapers here and now

  • Bijayananda Pradhan and Kotrayya Agadi. Big Data Application in Academic Libraries: status study

  • Ryma Benabdelaziz, Djamel Gaceb and Mohammed Haddad. Word Spotting in Historical Handwritten document Images based on Texture features in Spatial Context