The COST Action Distant Reading for European Literary History is organizing a Training School in Budapest, co-located with the DH_Budapest 2019 conference.

Key information

  • Dates: Monday, September 23 (1:30 pm) to Wednesday, September 25 (noon), 2019
  • Trainers: see below
  • Location: Budapest, Hungary
  • Contact persons: Roxana Patras, Training Schools Coordinator (, Christof Schöch, Action Chair (
  • Keynote: Katherine Bode (Australian National University)
  • Fee: there is no fee for participation

Background information

Training School Tracks

There are 3 parallel tracks each focusing on a different topic, but there are also several joint sessions. All participants need to decide on one track they want to follow and are expected to attend all 3 days of the Training School.

TRACK 1: Corpus design and text contribution for ELTeC
Coordinator: Carolin Odebrecht
Trainers: Carolin Odebrecht (Würzburg/Berlin), Christian Reul (Würzburg), Lou Burnard (Oxford), Martina Scholger (Graz)

The aim of this track is to give participants hands-on experience in creating TEI-XML versions of source texts compliant with the guidelines of the European Literary Text Collection (ELTeC), starting from scanned page images or from a preexisting HTML version. We will supply a set of raw materials for participants to work on, along with detailed instructions. At the end of our track sessions, each participant should be able to contribute new TEI encoded texts to the ELTeC GitHub repository.
For further information, see:

TRACK 2: Natural Language Processing for Distant Reading
Coordinator: Mike Kestemont
Trainers: Andrew Janco (Haverford), David Lassner (Berlin), Leonard Konle (Würzburg)

This track will focus on the application of natural language processing (NLP) for the purpose of distant reading of large corpora of literary fiction. In an introductory part on Python, we will review some of the core concepts of the language that should allow novice users to follow along. In the second part, we will introduce the spaCy library for natural language processing and offer a hands-on introduction to its suite of tools (Part-of-Speech tagging, Named Entity Recognition, Sentiment Analysis etc.). In the final session, we delve into the topic of finetuning existing models on historical data, using real-world data from the ELTeC collection.

TRACK 3: Canonization in Distant Reading Research
Coordinator: Antonija Primorac
Trainers: Marijan Dovic (Ljubljana), Karina van Dalen-Oskam (Amsterdam), Christof Schöch (Trier)

This track endeavors to answer the question Is the Canon a Theorem? by introducing participants (literary historians and others) to the specific topic of canonization in Computational Literary Studies (CLS). The discussions will address some of the most established notions about canonization, aspects of canon formation in different national literary traditions, and new ways of tackling with “the literary canon” from a CLS viewpoint.

Keynote by Katherine Bode

We are very pleased to announce that Dr. Katherine Bode (National University of Autralia, Canberra) will be delivering a keynote, open to all Training School participants and Action members. The title of the keynote will be “Living or dead? What is the corpus of digital literary studies?”. It will take place on Tuesday, September 24, 2019, from 17h15 to 18h45. Location tbc.