Call for Belgrade Training School: “Exploring ELTeC: Use-Cases for Information Extraction and Analysis”



Distant Reading Training School:
Exploring ELTeC: Use-Cases for Information Extraction and Analysis
within the COST Action 16204: Distant Reading for European Literary History

About the Training School

We invite you to the final Training School of the Distant Reading for European Literary History COST Action (CA16204), under the overarching topic of “Exploring ELTeC: Use-Cases for Information Extraction and Analysis”. This workshop consists of nine modules, including three lectures open to the online public, and takes a hybrid form, in which the participants can choose whether they prefer to participate in the training school in person or remotely. This is a single-track training school, therefore the participants are expected to participate in all, or at least most, sessions.

The Training School will present and teach hands-on approaches to information extraction and analysis of textual data, specifically ELTeC corpora developed within the Action. The workshops will cover various aspects of work with Named Entities and Geo-Entities, both in terms of their recognition and extraction, and their analysis, work with Wiki-ELteC data, linking (historical) data with Nodegoat, semantic analysis with word embeddings and language models, and comparing corpora with stylometry.
While there are no formal entry knowledge requirements, given the topics and intensity of the workshop, we advise participation with at least very basic computer skills (ability to install programs, run simple scripts).

Key information
• Date: 22-24 March 2022
• Place: Hybrid (online and in person)
• Trainers:
Christof Schöch (Trier University);
Maciej Eder, Joanna Byszuk and Artjoms Šeļa (Institute of Polish Language, Polish Academy of Sciences); Diana Santos (University of Oslo); Benedikt Perak (University of Rijeka); Fotis Jannidis (Würzburg University); Denis Maurel, (Université de Tours, Lifat, Computer Science Research Laboratory); Eric Laporte, Tita Kyriacopoulou; (Université Gustave Eiffel, LIGM); Jessie Labov (Central European University); Cvetana Krstev, Ranka Stanković, Milica Ikonić Nešić (University of Belgrade),

• Host institution: Belgrade Serbia, University of Belgrade, Faculty of Mining and Geology
• Organizers: Local organizers Ranka Stanković ( and Cvetana Krstev (; Joanna Byszuk, Working Group 2 Leader (
• Contact persons: Roxana Patras (, Training School Coordinator, Christof Schöch, Action Chair (, Ranka Stanković (, Joanna Byszuk, Working Group 2 Leader (, Diana Santos (
• There is no fee for participation
More on Distant Reading action

Target Audience
The target audience is formed of researchers, especially early-career investigators (ECI), from participating countries interested in Distant Reading, Digital Literary Studies, Corpus and Computational Linguistics and/or Literary Theory and their methodological uses across national traditions. Early Career Investigators (ECI) from Inclusiveness Target Countries (ITC) ( are strongly encouraged to apply!
The lecture parts of the Training School will be made broadly available in online form.

How to apply

We decided to offer call prolongation for online participants – if you want to participate in the TS remotely, you can apply until 11th March and will be notified by 15th of your acceptance. The application procedure is the same as in the original dates – you can find the information below.

To apply for participation in the TRAINING SCHOOL “Exploring ELTeC: Use-Cases for Information Extraction and Analysis”, please do the following things:
Create an account on e-COST at
Send the following documents as one PDF document (filename = your last name) to Roxana Patras ( and Christof Schöch ( until 25th February 2022:

  • a one-page Curriculum Vitae
  • a one-page motivation letter, including:
    • a clear statement of intent and your reasons to participate in a TITLE workshop;
    • information about your current level of knowledge / experience in using workshop-related digital tools;
    • what your expectations are (trainers & teaching materials);
    • a specification if you want to participate online or in person (with filling in the budget in the e-COST system).

After submission, all applicants are kindly asked to make sure they have received a confirmation of receipt within 24 hours. If this is not the case, please don’t hesitate to contact us again.
Selection criteria
Applicants will be notified of their acceptance until 4th March 2022.
Basic COST eligibility criteria need to be met.
Applicants with research interests relevant to the COST Action’s goals and activities are prioritized (Rank 1).
Applicants who are early career researchers (including doctoral researchers) and applicants from COST Inclusiveness Target Countries (ITC) are prioritized over other applicants (Rank 2).
Insofar as is allowable by the applications received and the selection criteria above, representation across as many COST countries as possible and a gender balance should be observed among the grantees (Rank 3).
Where all the above criteria are satisfied, places are offered on a first-come-first-served basis (Rank 4)


  1. Christof Schöch: introductory lecture about the project and about ELTeC (probably online). This lecture is an introduction to the objectives of the COST Action ‘Distant Reading for European Literary History’, with a particular focus on the structure of the core deliverable of the project, the multilingual European Literary Text Collection (ELTeC).
  2. Maciej Eder, Joanna Byszuk, Artjoms Šeļa: Exploring and comparing ELTeC corpora with stylometry (online)
  3. Diana Santos, NER exploitation and analysis (online)
  4. Benedikt Perak: ELTeC Data Analysis, Representation of the Geo-Entities and Interlinking with Knowledge bases. (on site)
  5. Fotis Jannidis, Leo Konle: Semantic analysis using word embeddings and language models. (online)
  6. Ranka Stanković, Milica Ikonić Nešić: Wiki-ELTeC data session. Wikidata introduction; Wiki-ELTeC schema (all metadata from header plus main characters, their relations, places); pipeline for Wiki-ELTeC data population; predefined SPARQL query exploration. (on site) Hands-on: population of Wikidata for other languages.
  7. Denis Maurel, Eric Laporte, Tita Kyriacopoulou, Cvetana Krstev: Unitex for processing of literary text: the case of NER automata. Enriching ELTEC texts by Named Entity Recognition using CasSys to parse texts with Unitex graph cascade of finite state transducers in different languages. (on site)
  8. and 9. Jessie Labov: ELTec in Nodegoat 1) Introduction to the Nodegoat interface and how it works with this kind of data, 2) Using Nodegoat for working with the ELTeC data (specifically the NER), demonstrating how to enrich it by linking it to open data sources. (online)


Training schedule:

 March 22, 2022 March 23, 2022 March 24, 2022
9-10 Christof  Schöch: What is ELTeC all about?
9-13 Diana Santos: NER exploitation and analysis
9-9:45 lecture
15 min break10:00-11:30 hands-on15 min break
11:45-13 lecture
9-13  Denis Maurel, Eric Laporte, Cvetana Krstev: Unitex for processing of  literary text: the case of NER automata
9:00-10:30 lecture, 
10:30-10.45 a small break
10:45-12h lecture
12:00-13:00 hands-on
Maciej  Eder, Joanna Byszuk, Artjoms Šeļa: Exploring and comparing ELTeC  corpora with stylometry
10:15  –  lecture intro
11:00  –  Stylo intro hands-on
11:35-11:40  – a small break
11:40-13:00  –  advanced hands-on
14:00-15.30Ranka Stanković, Milica Ikonić Nešić: Wiki-ELTeC data session
14:00-14:45 lecture
14:45-15:30 hands-on
14:30-16   Jessie Labov, Pim van Bree, Geert Kessels:  Linking historical data with Nodegoat: Introduction 
Fotis  Jannidis, Leonard Konle: Semantic analysis using word embeddings and language models.
14:00-15:15 lecture
15:15-16:00 hands-on
15:45 –  17:15Benedikt Perak: ELTeC Data Analysis, Representation of the Geo-Entities and Interlinking with Knowledge bases
15:45-16:30 lecture
16:30-17:15 hands-on
16:15 -18:15Pim van Bree, Geert Kessels:  ELTec in Nodegoat 

Biographical notes:


Christof Schöch (Trier University): Christof Schöch is Professor of Digital Humanities at the University of Trier, Germany, and Co-Director of the Trier Center for Digital Humanities. He is the chair of the COST Action Distant Reading for European Literary History. Find out more at 

Joanna Byszuk is a researcher at the Institute of Polish Language, Polish Academy of Sciences, as well as a member of the Computational Stylistics Group. Her research focuses on cross-lingual computational stylistics and advancing stylometric methodology and its understanding, especially locating method limitations and developing evaluation procedures. She is also interested in the concept of authorship and in discourse analysis in multimodal and collaboration perspectives. She is also the leader of Working Group 2: Methods and Tools within this COST Action.

Maciej Eder is the Director of the Institute of Polish Language at the Polish Academy of Sciences, and an Associate Professor at the Pedagogical University of Kraków, Poland (the latter part-time). His recent research is focused on computational stylistics, or stylometry. As a literary scholar, he is interested in Polish literature of the 16th and the 17th centuries: critical scholarly editions being his main area of expertise.

Artjoms Šeļa is currently doing postdoctoral research at the Department of Methodology of the Institute of Polish Language (Kraków) and is a research fellow at the University of Tartu (Estonia). He holds PhD in Russian Literature and uses computational methods to understand historical change in literature and culture. His main research interests include stylometry, verse studies and cultural evolution. Sometimes he makes forays into digital preservation and the history of quantitative methods in humanities.

Diana Santos has organized three NER evaluation campaigns for Portuguese, called HAREM, back in 2007-2009, and taught about NE in several venues, in Portugal and at ESSLLI. She is currently professor of Portuguese language, and Statistics for Humanities at the University of Oslo. Find out more at 

Benedikt Perak is an assistant professor at the Faculty of Humanities and Social Sciences, University of Rijeka, where he has been teaching courses in the fields of linguistics, digital humanities and data science. The research interest is related to the implementation and development of methods of digital humanities, NLP and data science. Find out more at  

Fotis Jannidis (Würzburg University) is Professor for Digital Humanities at the University of Würzburg in Germany. His main field of research is the quantitative analysis of literary texts. Currently he is the coordinator of the priority program ‘Computational Literary Studies’ with 10 funded projects (Computational Literary Studies). Find out more at

Denis Maurel, (Université de Tours, Lifat – Computer Science Research Laboratory): Denis Maurel is Professor of computer science at the University of Tours, France. He contributes to the free software Unitex. He is actually working in a French ANR Project to use literature-based discovery in scientific biological papers. (

Eric Laporte is a Professor in Computer Science at Université Gustave Eiffel and a member of the LIGM laboratory. He is a linguist as well and his research is mainly about language resources for natural language processing. Find out more at 

Tita Kyriacopoulou; (Université Gustave Eiffel, LIGM)  Find out more at 

Jessie Labov (Institute for Literary Studies, Humanities Research Center, Budapest) is a Researcher in the Literary Theory Department of the Institute of Literary Studies at the Eötvös Loránd Research Network, where she is developing a new project on Hungarian Literature as World Literature. She is also Vice-Chair of the NEP4DISSENT COST Action. Find out more at  

Cvetana Krstev is  a professor at the University of Belgrade, Faculty of Philology. Her research interests are building lexical resources for Serbian for NLP – corpora, electronic dictionaries,  wordnet.  She is the author of a tool for named entity recognition and annotation for Serbian SrpNER.. Find out more at

Ranka Stanković is an associate professor at the University of Belgrade, her field of research is NLP, semantic web, lexical resources, geoinformation management and deep learning. She is the Head of the Computer Center and the Chair for Applied Mathematics and Informatics, and Vice-president of the Language Resources and Technologies Society (JERTEH). Find out more at 

Milica Ikonić is a teaching assistant at the University of Belgrade, Faculty of Philology and PhD student of the Intelligent Systems programme at the University of Belgrade, her field of research is NLP, information extraction, linked open data and language models. Find out more at