What is ELTeC?
ELTeC is the European Literary Text Collection. It is one of the key deliverables of the COST Action ‘Distant Reading for European Literary History’ (CA16204) that ran from 2017 to 2022. ELTeC is a collection of corpora of literary texts that are comparable in nature, scope and quality across several European languages. Its availability is an essential condition for the creation, evaluation and use of multilingual tools and methods of analysis for literary texts. Novels have been chosen among major literary genres for availability and size. Chronological limits are due to constraints related to copyright and availability of quality full texts.
ELTeC has three components
- ELTeC core: 10 complete corpora in 10 different languages, with 100 novels per language, and with comparable internal structure, for the period 1840 to 1920.
- ELTeC plus: 10 corpora in 10 additional languages, also covering the period 1840-1920, and following the same composition principles as ELTeC core, but (currently) containing less than 100 novels.
- ELTeC extensions: collections representing languages already represented in ELTeC, but either (a) containing additional novels from the same period as ELTeC core and ELTeC plus, in order to broaden the empirical base for analyses, or (b) covering earlier periods, notably the time from 1750 to 1840, enabling diachronic views on literary history.
In 2021, ELTeC core was completed, with 10 corpora of 100 novels comparable in their internal structure in at least 10 different European languages. As of May 2022, the ELTeC plus corpora take the total number of available full-text novels to more than 1350; with the ELTeC extensions, more than 2000 full-text novels are included in ELTeC.
More information on ELTeC
- An overview of the current state in ELTeC corpus building can be found here: https://distantreading.github.io/ELTeC/
- Work on the different ELTeC corpora is in progress here: https://github.com/COST-ELTeC
- A collection of relevant documentation can be found here: https://distantreading.github.io/
- The schema files for the different levels of encoding are available as well: https://github.com/COST-ELTeC/Schemas
- ELTeC page on Zenodo, with archived releases, one for each corpus: https://zenodo.org/communities/eltec/
Citation suggestion for ELTeC: European Literary Text Collection (ELTeC), version 1.1.0, April 2021, edited by Carolin Odebrecht, Lou Burnard and Christof Schöch. COST Action Distant Reading for European Literary History (CA16204). DOI: doi.org/10.5281/zenodo.4662444). (See also the citation suggestions for individual ELTeC collections in their respective repositories on Github.)
Reference publications (please cite one or both if you use one or several of the corpora included in ELTeC):
- Lou Burnard, Christof Schöch, Carolin Odebrecht (2021): “In Search of Comity: TEI for Distant Reading”, in: Journal of the Text Encoding Initiative 14. DOI: https://doi.org/10.4000/jtei.3500.
- Christof Schöch, Roxana Patraș, Diana Santos, Tomaž Erjavec (2021): “Creating the European Literary Text Collection (ELTeC): Challenges and Perspectives”, in: Modern Languages Open 1/25. DOI: http://doi.org/10.3828/mlo.v0i0.364.