DATeCH2017 - WS5

PoCoTo User Workshop

Tutor(s): Dr. Uwe Springmann, Florian Fink

Date: May 31, 2017 1 PM – 5 PM

Locations: GCDH, Seminar Room 1 (ground floor), Heyne Haus, Papendiek 16, 37073 Göttingen, Germany

Abstract

PoCoTo (PostCorrectionTool) is an open source tool for the interactive post-correction of OCR‘ed data that has be developed by the CIS of the LMU Munich. This workshop will teach the participants both the basic usage of PoCoTo for the correction of OCR results and more advanced topics like the usage of the language profiler for the generation of correction suggestions that take historical spelling variation of the document language into account.

The first part of the workshop will guide the users through the process of the installation of PoCoTo and the creation of a first project.

Afterward the basic techniques for the correction of documents with PoCoTo are introduced. These include simple corrections, corrections of merged and split tokens and the correction of concordance series. In the second part of the workshop the language profiler will be introduced.

The users will be guided through the profiling of an example project in PoCoTo and the usage of the generated correction suggestions for the post-correction will be shown.

For an active participation a laptop with either a Windows or Linux OS and an up-to-date Java installation is recommended. Additional content for the exercises is available for download and will also be provided offline for the participants of the workshop.