The task for WG1 is to facilitate the creation of the European Literary Text Collection (ELTeC). This work can be split into three distinct tasks: First, defining selection criteria (corpus design); second, developing basic encoding methods (both for data and for metadata); and third, defining a suitable workflow for preparation of the corpus. For creating such a benchmark corpus, we need a corpus design which allows for a comparability of texts and individual sub-collections according to different metadata sets. It should be possible for every COST Action member to sample sub-collections from the ELTeC for specific tasks and research questions, and to reformat them in ways appropriate to their own tools. The focus of the ELTeC encoding scheme is thus not to represent texts in all their original complexity of structure or appearance, but rather to facilitate a richer and better-informed distant reading than a transcription of its lexical content alone would permit. We hope to achieve these objectives in a collaborative and interdisciplinary way.
- For more about the corpus, see our page dedicated to ELTeC
- The Working Papers of WG1 can be viewed at http://distantreading.github.io/




Universiteit Antwerpen, Antwerpen
Jožef Stefan Institute, Ljubljana
Univerity of Trieste, Trieste
Charles University, Praha
École Normale Supérieure de Lyon, Lyon
Faculdade de Ciências Sociais e Humanas, Lisbon
University of Oxford, Oxford
International Burch University, Sarajevo
Institute of Polish Language, Krakow
University Spiru Haret Bucharest, Bucharest
United Kingdom
South-West University, Blagoevgrad
National and Kapodistrian University of Athens, Athens
University of Belgrade, Faculty of Philology, Belgrade
Universität Trier, Trier
University of Alicante - Departamento de Lenguajes y Sistemas Informáticos, Alicante
Department of Philosophy, Sociology. Education & Applied Psychology, Padova
King's College London, London
King's Digital Laboratory, London
