The task for WG1 is to facilitate the creation of the European Literary Text Collection (ELTeC). This work can be split into three distinct tasks: First, defining selection criteria (corpus design); second, developing basic encoding methods (both for data and for metadata); and third, defining a suitable workflow for preparation of the corpus. For creating such a benchmark corpus, we need a corpus design which allows for a comparability of texts and individual sub-collections according to different metadata sets. It should be possible for every COST Action member to sample sub-collections from the ELTeC for specific tasks and research questions, and to reformat them in ways appropriate to their own tools. The focus of the ELTeC encoding scheme is thus not to represent texts in all their original complexity of structure or appearance, but rather to facilitate a richer and better-informed distant reading than a transcription of its lexical content alone would permit. We hope to achieve these objectives in a collaborative and interdisciplinary way.
The Working Papers of WG1 can be viewed at http://distantreading.github.io/ .