One of the aims of the Distant Reading COST Action is to coordinate the creation of a multilingual European Literary Text Collection (ELTeC). The existence of such a collection is an essential condition for the creation of tools and methods of analysis comparable in nature, scope and quality across several European languages. The ELTeC will be built in three iterations:
- First iteration: 6 subcollections (100 novels per language) for the period ca. 1850 to 1920, providing a starting point for research.
- Second iteration: at least an additional 4 subcollections (100 novels per language) for the same period, completing the “ELTeC core”.
- Third iteration: extensions to the “ELTeC core” with at least 6 additional subcollections (a) in additional languages, widening the range of ELTeC, (b) for languages already included, but the earlier period from ca. 1780 to 1850, enabling diachronic views on literary history and (c) with additional, larger but less strictly structured subcollections for languages already included, providing a broader empirical base for specific analyses.
The ELTeC core will contain at least 10 linguistically annotated subcollections of 100 novels comparable in their internal structure in at least 10 different European languages, totalling at least 1,000 full-text novels. The extended ELTeC will take the total number of full-text novels to at least 2,500. Novels have been chosen among major literary genres for availability and size. Chronological limits are due to constraints related to copyright and availability of quality full texts.
Work on ELTeC is in progress on the Action’s Github pages.