Würzburg Training School

Training School: “Optical Character Recognition and Text Encoding for the production of ELTeC”

This Training School is about Optical Character Recognition and Text Encoding for the production of ELTeC (European Literary Text Collection) contributions. The ELTeC will contain a principled sample of European literary production, containing the full text of novels in each of many European languages published between 1850 and 1920.

The aim of the Training School is to enable a group of Action participants to go from a novel in the form of a scanned book to a TEI-encoded full-text version of the novel. The target audience are researchers from participating countries interested in contributing texts to the ELTeC but who need to digitize texts for this purpose and who are insufficiently familiar with the practicalities of Optical Character Recognition for full-text generation and with the fundamentals of using the Guidelines of the Text Encoding Initiative to do so.

All participants are expected to attend both days of the Training School.

Key information

Dates: Monday April 16 (all day) to Tuesday April 17, 2018 (all day)
Location: University of Würzburg, Germany
Local organizer: Leonard Konle (leonard.konle@uni-wuerzburg.de) and Fotis Jannidis
Contact persons: Carolin Odebrecht (carolin.odebrecht@hu-berlin.de), Christof Schöch (schoech@uni-trier.de)
Trainers: Christian Reul, Leonard Konle, Lou Burnard
Background information: https://github.com/distantreading/WG1/wiki

Programme outline

Monday, April 16, 2018

Location: Philosophische Fakultät, Room 6.E.8, Googlemaps: https://goo.gl/maps/Lx7B3dRRpMs

09:15 Welcome to all participants
09:30 OCR basics
10:30 Hands-on OCR with Abbyy FineReader
12:00 Lunch
13:00 Demo of OCROPUS
15:30: Anna Řehořková, “Digitization practice in the Czech National Corpus” (talk)
15:45 Coffee, cookies and questions
19:30 Dinner at “Alter Kranen”
Adress: Kranenkai 1, https://goo.gl/maps/iHyVPQCTQPE2

Tuesday, April 17

Location: Philosophische Fakultät, Room 3.E.3, Googlemaps: https://goo.gl/maps/Lx7B3dRRpMs

09:30 An introduction to the Text Encoding Initiative and to the ELTeC encoding principles
12:00 Lunch
13:00 Practical Work on converting texts from OCR output to XML-TEI and on encoding texts for the ELTeC
For details, see: https://distantreading.github.io/Training/programme.html

Training School: “Optical Character Recognition and Text Encoding for the production of ELTeC”

Key information

Programme outline

Follow us on facebook