What is ELTeC?

ELTeC is the European Literary Text Collection. It is one of the key deliverables of the COST Action ‘Distant Reading for European Literary History’ (CA16204) that ran from 2017 to 2022. ELTeC is a collection of corpora of literary texts that are comparable in nature, scope and quality across several European languages. Its availability is an essential condition for the creation, evaluation and use of multilingual tools and methods of analysis for literary texts. Novels have been chosen among major literary genres for availability and size. Chronological limits are due to constraints related to copyright and availability of quality full texts.

ELTeC has three components

  • ELTeC core: 12 complete corpora in 12 different languages, with 100 novels per language, and with comparable internal structure, for the period 1840 to 1920.
  • ELTeC plus: 9 corpora in 9 additional languages, also covering the period 1840-1920, and following the same composition principles as ELTeC core, but (currently) containing less than 100 novels.
  • ELTeC extensions: additional collections representing languages already represented in ELTeC, but either (a) containing additional novels from the same period as ELTeC core and ELTeC plus, in order to broaden the empirical base for analyses, or (b) covering earlier periods, notably the time from 1750 to 1840, enabling diachronic views on literary history.

In 2021, ELTeC core was completed, with 10 corpora of 100 novels comparable in their internal structure in at least 10 different European languages. As of March 2023, the ELTeC plus corpora take the total number of available full-text novels to more than 1544; with the ELTeC extensions, more than 2000 full-text novels are included in ELTeC. All texts are available in XML-TEI valid against a specific schema and many are available with linguistic annotation.

More information on ELTeC


Citation suggestion for ELTeC: European Literary Text Collection (ELTeC), version 1.1.0, April 2021, edited by Carolin Odebrecht, Lou Burnard and Christof Schöch. COST Action Distant Reading for European Literary History (CA16204). DOI: doi.org/10.5281/zenodo.4662444). (See also the citation suggestions for individual ELTeC collections in their respective repositories on Github.)

Reference publications (please cite one or both if you use one or several of the corpora included in ELTeC):

  • Lou Burnard, Christof Schöch, Carolin Odebrecht (2021): “In Search of Comity: TEI for Distant Reading”, in: Journal of the Text Encoding Initiative 14. DOI: https://doi.org/10.4000/jtei.3500.
  • Christof Schöch, Roxana Patraș, Diana Santos, Tomaž Erjavec (2021): “Creating the European Literary Text Collection (ELTeC): Challenges and Perspectives”, in: Modern Languages Open 1/25. DOI: http://doi.org/10.3828/mlo.v0i0.364.