The COST Action Distant Reading for European Literary History is organizing a Training School in Budapest, co-located with the DH_Budapest 2019 conference.

Key information

  • Dates: Monday, September 23 (1:30 pm) to Wednesday, September 25 (noon), 2019
  • Trainers: see below
  • Location: Budapest, Hungary
  • Contact persons: Roxana Patras, Training Schools Coordinator (, Christof Schöch, Action Chair (
  • Keynote: Katherine Bode (Australian National University)
  • Fee: there is no fee for participation

Background information

Training School Tracks

There are 3 parallel tracks each focusing on a different topic, but there are also several joint sessions. All participants need to decide on one track they want to follow and are expected to attend all 3 days of the Training School.

TRACK 1: Corpus design and text contribution for ELTeC
Coordinator: Carolin Odebrecht
Trainers: Carolin Odebrecht (Würzburg/Berlin), Christian Reul (Würzburg), Lou Burnard (Oxford), Martina Scholger (Graz)

The aim of this track is to give participants hands-on experience in creating TEI-XML versions of source texts compliant with the guidelines of the European Literary Text Collection (ELTeC), starting from scanned page images or from a preexisting HTML version. We will supply a set of raw materials for participants to work on, along with detailed instructions. At the end of our track sessions, each participant should be able to contribute new TEI encoded texts to the ELTeC GitHub repository.
For further information, see:

TRACK 2: Natural Language Processing for Distant Reading
Coordinator: Mike Kestemont
Trainers: Andrew Janco (Haverford), David Lassner (Berlin), Leonard Konle (Würzburg)

This track will focus on the application of natural language processing (NLP) for the purpose of distant reading of large corpora of literary fiction. In an introductory part on Python, we will review some of the core concepts of the language that should allow novice users to follow along. In the second part, we will introduce the spaCy library for natural language processing and offer a hands-on introduction to its suite of tools (Part-of-Speech tagging, Named Entity Recognition, Sentiment Analysis etc.). In the final session, we delve into the topic of finetuning existing models on historical data, using real-world data from the ELTeC collection.

TRACK 3: Canonization in Distant Reading Research
Coordinator: Antonija Primorac
Trainers: Marijan Dovic (Ljubljana), Karina van Dalen-Oskam (Amsterdam), Christof Schöch (Trier)

This track endeavors to answer the question Is the Canon a Theorem? by introducing participants (literary historians and others) to the specific topic of canonization in Computational Literary Studies (CLS). The discussions will address some of the most established notions about canonization, aspects of canon formation in different national literary traditions, and new ways of tackling with “the literary canon” from a CLS viewpoint.

Keynote by Katherine Bode

We are very pleased to announce that Dr. Katherine Bode (National University of Autralia, Canberra) will be delivering a keynote, open to all Training School participants and Action members. The title of the keynote will be “Living or dead? What is the corpus of digital literary studies?”. It will take place on Tuesday, September 24, 2019, from 17h15 to 18h45. Location tbc.

Is the corpus of digital literary studies a collection (or dataset or system) of books or of texts? Which, if any, would would be more lively (or deadly), and why would that matter? This paper considers these questions by discussing a topic rarely raised in digital literary studies, but increasingly prevalent in the broader discipline: empiricism. I explore some of the problems that (digital) literary studies has with materiality and wonder if posthumanist notions of the inseparability of matter and meaning might help to address them. 

Biographical note
Katherine Bode is an associate professor of literary and textual studies at the Australian National University. She is the author or co-editor of books including A World of Fiction: Digital Collections and the Future of Literary History (2018), Advancing Digital Humanities: Research, Methods, Theories (2014), Reading by Numbers: Recalibrating the Literary Field (2012) and Resourceful Reading: eResearch, the New Empiricism, and Australian Literary Culture (2009).