A new digital library was recently established at the University Library „Svetozar Marković“ in Belgrade, Serbia, initially populated by novels from the Serbian contribution to the ELTeC corpus. Users of the library can simultaneously read a digital facsimile of a copy of the chosen novel, as well as a text-only version.
https://www.distant-reading.net/wp-content/uploads/1024px-Univerzitetska_biblioteka_Beograd_10.jpg6801024Justin Tonrahttps://www.distant-reading.net/wp-content/uploads/distantreading_logo-high-300x138.pngJustin Tonra2020-03-09 12:01:562020-03-09 12:01:57Digital Library of Serbian ELTeC Novels
This is a translation of an article about our recent Action meeting in Málaga, written by Cristina Fernández and originally published in Málaga Hoy on 18 February 2020. We are grateful for permission to publish this translation of the original article.
The project tries to search through distant reading for a common tradition in the 19th century novel
It would be absolutely impossible for a researcher to read all the novels written in 19th century Europe in an attempt to discover a common literary tradition in them. Not only would they have to live a thousand lives, but they would also have to know dozens of languages and be familiar with the many distant cultures. Now, computer methods can help in the study thanks to what Franco Moretti called distant reading.
Since November 2017, a four-year COST project financed by the European Union within its Horizon 2020 programme has brought together more than 200 people from some thirty countries in this task, providing a large and representative collection of texts in 12 languages to enable them to analyse common elements among themselves. From 17th to 19th February, more than fifty participants from 24 countries are meeting at the University of Malaga.
Literature to understand the world
Rosario Arias, professor of English Philology at UMA, is hosting the group that will share the different advances in the Link by UMA building on the Teatinos campus for three days. Christof Schöch, from the University of Trier in Germany, is the project’s Principal Investgator.
“Normally you read one novel, three or five to write a research paper but you can’t read thousands of texts because it takes time, and what we don’t want is for neglected texts to be forgotten,” Schöch explains. “You can use the computer and distant reading to get to more texts,” he adds.
Almost a thousand digitized works written between 1840 and 1920
Project researchers are building a multilingual European literary text collection that already contains nearly a thousand digitized volumes published between 1840 and 1920 and a dozen languages. The aim is to reach 2,500.
“From each national tradition we are trying to select texts that are representative, from authors considered canonical and others less known, from men and women, short or longer novels,” explains Justin Tonra, a project participant from Ireland.
“This project tries to make a transnational comparison, but it is very difficult because each tradition is in itself different. That is why the project is very broad, it has many challenges, because we compare multilingual and multicultural traditions,” adds Tonra.
The Principal Investigator indicates that “we try to look for characteristics, features, that are easily comparable and that are common to different traditions such as names of places, of authors, of philosophers, of cities… it is a matter of tracing those elements”.
A collaborative and integrative work
Also, as Rosario Arias points out, “to find out if there is such a thing as what we suppose to be in the 19th century, which is a transition from the omniscient person to a more introspective narrative”. The description of professions, the importance of social class or religion, whether the novel takes place in rural or urban settings, whether this migration from the countryside to the city can be traced in different traditions and whether the same thing happens in the Slovenian, Greek, Italian or Spanish novel are questions to which they seek answers.
“The most relevant part of the project is not the tools, which are very interesting in themselves, nor the texts, above all it is to seek and propose the foundations of a European cultural identity with the collaboration of many individuals from different traditions, it is a very inclusive project, very integrating, the path is also part of the process, that is why we want to do it in a cooperative way, in a network”, says Schöch.
The answers are not easy, they are not black and white, they are difficult to reach because we are working with many researchers, many texts, many traditions, but the process itself is worth it. Until now, there has been no work at this level or in such a global and complete way on European literary history.
https://www.distant-reading.net/wp-content/uploads/distantreading_logo-high-300x138.png00Justin Tonrahttps://www.distant-reading.net/wp-content/uploads/distantreading_logo-high-300x138.pngJustin Tonra2020-03-09 11:02:032020-03-09 11:09:43UMA welcomes experts in European literature from 24 countries
One of the principal aims of our Action, Distant Reading for European Literary History, is to build a multilingual European Literary Text Collection (ELTeC), ultimately containing around 2,500 full-text novels in at least 10 different languages. Today, we are pleased to announce the first public release of ELTeC, with nine language collections included!
The ultimate aim is for ELTeC is to provide multiple collections of 100 novels published between 1840 and 1920 in their original languages. Work to add more novels to ELTeC is ongoing, as we aim to build a corpus which will aid us in our task to develop the resources necessary to change the way European literary history is written. Progress towards our goal and current statistics about the collections can be found on this ELTeC Summary page: https://distantreading.github.io/ELTeC/
Language collections in this first release are in German, English, French, Italian, Norwegian (Bokmål and Nynorsk), Portuguese, Romanian, Serbian, and Slovenian. The collections can be downloaded here: https://zenodo.org/communities/eltec/
As work progresses on ELTeC, we invite you to use our collections. Feedback from the community would be very welcome as we improve our collections and work towards future releases. Please feel free to write to me or relevant Working Group Leader, Carolin Odebrecht with your comments, or submit to our GitHub issue tracker. If you are interested in becoming a member of our Action, find out more here: https://www.distant-reading.net/about/participate/.
Thanks to all of our colleagues from across Europe (and beyond) who have helped us to reach this important milestone!
https://www.distant-reading.net/wp-content/uploads/books-2273257_1280.jpg8681280Justin Tonrahttps://www.distant-reading.net/wp-content/uploads/distantreading_logo-high-300x138.pngJustin Tonra2019-11-20 16:06:252019-11-26 11:35:35Distant Reading Novel Collections Released in ELTeC Version 0.5.0
Distant reading, driven by the development of digital technology in the human sciences, has emerged as one of the most prolific approaches to literary texts. Maps, graphs and trees, in Moretti’s (2005) words, allow us to reread famous works in a new way, or to look at large amounts of texts that have long been forgotten. However, often, approaches to distant reading disregard the acquisition of the data to be observed: Where do they come from? How are they created?
Our training school proposes to return to the crucial stage of data acquisition, focusing on details of the production chain of literary data. During the two-day course, we will start with OCR (optical character recognition), which makes it possible to transform an image into machine-readable text, addressing the difficulties introduced by the variation of graphic systems or the materiality of old artifacts. The second – decisive – step is the encoding in XML-TEI, which transforms the text into a usable database and allows to addition of more information to the text (e.g., author, gender, period) for ensuing analysis. The third and final step is the analysis with R, which allows hypotheses to be tested and patterns to be explored by analysing and visualising data.
With a strong emphasis on practical experience, this training school is geared towards building the framework for a first multilingual Swiss literary corpus (French, Italian and German). Tasks participating in its construction during the training school will provide an opportunity to discuss pertinent issues.
This course is part of a collective work carried out within the European COST “Distant Reading For European Literary History” project of which the organizers are the Swiss representatives: https://www.distant-reading.net/
The working language of the training school is English, knowledge of at least one of the three languages of the literary data (French, Italian, German) is also required.
All information, including the full training school programme, can be found in French, German, and Italian here and here.
Registration process: target group of the training school isdoctoral students affiliated with the universities of Basel, Bern, Fribourg, Geneva, Neuchâtel and Lausanne as well as from the EPFL. Post-doc researchers can apply via a short email pending registration of PhD students who have priority.
Please register by sending an email to firstname.lastname@example.org Participation is free of charge for doctoral students. All travel and accommodation expenses are covered by the doctoral program.
Course title “Distant Reading – Tools and Methods”
Instructors: Simon Gabay, Berenike Herrmann, Simone Rebora, Elias Kreyenbühl
Date: 12 and 13 December 2019
Location: Basel Public and University Library (UB)
https://www.distant-reading.net/wp-content/uploads/distantreading_logo-high-300x138.png00Justin Tonrahttps://www.distant-reading.net/wp-content/uploads/distantreading_logo-high-300x138.pngJustin Tonra2019-11-13 14:55:102019-11-13 14:55:11Invitation to Doctoral Training School: "Distant Reading - Tools and Methods." Basel, 12-13 December 2019.
The First Workshop on Distant Reading in Portuguese will take place on 27-28 October 2019 at the University of Oslo, and will feature a presentation on our COST Action on Distant Reading for European Literary History by Isabel Araújo Branco, Diana Santos, Paulo Silva Pereira and Raquel Amaro.
At the conference, which features additional presentations by members of our Action, participants will illustrate the state of the art, discuss research questions for the medium and long term, and to take a position on several Portuguese-related matters within the sphere of Distant Reading.
Further details, including the programme of the workshop and abstracts, are available here: Portuguese | English (via Google Translate).
We are very happy to announce that Action members Roxana Patraș and Ioana Galleron received approval for funding their project HAI-RO (Hajduk Novels in Romania During the Long Nineteenth Century: Digital Edition and Corpus Analysis Assisted by Computational Tools) within the “Brâncuși” Program for Integrated Actions. Even if the acronym HAI-RO originally stood for “Romanian hajduk” we like to think of it more in the terms of its literal translation: “Come on, Romania!”
The program rests on an enthusiastic collaboration between the “Alexandru Ioan Cuza” University of Iași and Université Paris 3 Sorbonne Nouvelle, which started when the project’s principal investigators decided to experiment on a series of TEI-XML novels for ELTeC (the “European Literary Text Collection” developed within the Distant Reading COST Action. This further lead to the formation of a French-Romanian team whose members are Camelia Grădinaru, Ioana Lionte and Alexandra Oltean (both of them Training School grantees), Lucreția Pascariu, Chiara Mainardi, and Ofra Lévy.
In a nutshell, the project seeks to remedy the shortcomings of the resources and tools specially designed for the Romanian Computational Literary Studies. The research will focus on the characterisation of the features that are inherent to “the hajduk literature” by adapting the specific instruments of corpus linguistics to the specificities of literary texts printed in non-standardised Romanian. This will be completed with an analysis of the external features developed along the historical evolution of a literary genre that has generally been labelled as “national”.
The main result of the project will be the creation of a Romanian literary corpus (1850-1950) conformant with the XML-TEI international standard of digital editing and including semantic annotations. The project will be accessible on an Open Access basis and will be stored on the Nakala platform (France) and signalled on HAL-SHS. In accordance with its scientific objectives, the project will produce a schema and an annotation guide for spatial terms, both adapted to literary texts. It will also provide a rich input for the Romanian collection of ELTeC.
https://www.distant-reading.net/wp-content/uploads/hairo1.jpg6391136Justin Tonrahttps://www.distant-reading.net/wp-content/uploads/distantreading_logo-high-300x138.pngJustin Tonra2019-08-28 13:43:592019-10-12 13:07:43The HAI-RO project: a new offshoot of the Distant Reading COST Action
Action members Mike Kestemont (University of Antwerp) and Maciej Eder (Polish Academy of Sciences) are happy to report that they have recently secured funding for a three-year, collaborative research project, following a joint call of the Research agency of Flanders (FWO) and the Polish Academy of Sciences (PAS). The project can be considered a spin-off of this COST action and, as an academic airbridge between Antwerp and Krakow, it will intensify the already strong ties that exist between various research teams in their respective institutions.
The project is entitled ‘Deep Learning in Computational Stylistics.’ In the proposed collaboration, they aim to turn our attention to “deep” representation learning in order to improve computational methods for the robust stylistic analysis of short documents (< 1000 words). Although this technology is nowadays also emerging in Humanities research, it is surprising how (relatively) few applications have been reported so far in the domain of authorship attribution. The few research examples that have been published in this domain focus on micro-blogging data and is hard to extrapolate to longer documents. The researchers propose a three-year collaboration aimed at the introduction and adaptation of deep learning methods to computational stylistics, with an emphasis on author identification.
https://www.distant-reading.net/wp-content/uploads/distant-reading_icon_v3a.png286301Justin Tonrahttps://www.distant-reading.net/wp-content/uploads/distantreading_logo-high-300x138.pngJustin Tonra2019-01-29 12:38:592019-10-12 13:21:21Research Funding Success for Action Members
Two of my colleagues from Maynooth University and I attended the COST Action Training Schools at the EADH 2018 conference in Galway. While one of us signed up for the theory sessions, discussing different approaches to ‘style’ in the digital humanities, an early career researcher in Early Irish and I (currently working as project manager in Digital Humanities) attended the sessions on methods and tools of ‘Distant Reading’. Our workshop group was very international and brought people from various disciplines together. Most of us had only recently started to extensively use digital tools for data/text analysis and had limited programming experience. That was why the step-by-step introductions to different technology-supported methods of topic modelling, stylometry, and data visualisation were a perfect fit. We were introduced to downloadable software with elaborate graphic user interfaces (e.g. TXM and Gephi) as well as portable software (Dariah Topics Explorer) in development and a stylometry tool based on R-libraries, which required working in the command line.
Participants at the Methods & Tools Training School, Galway (5-7 December 2018)
The corpora chosen by the workshop facilitators were mainly selections of British fiction, the North-American Brown Corpus and some smaller fiction corpora in other European languages (French, Italian, Hungarian, Slovene). For me as a historian specialising in visual cultures and politics of the early modern period, these were uncommon sources, but at the end of most workshop sessions, I had some time to apply each method and tool to my own corpora (e.g. a collection of political letters from Ireland). The workshop facilitators made sure that all participants were able to keep step and competently answered our questions. As not all methods and tools presented to us will, however, be equally relevant to our future research, additional ‘experimental time’ to work with just one of the methods/tools in smaller groups on the last workshop day would have been even more beneficial. There was a lot to take in, and a longer supervised ‘lab’ session focusing on a chosen method and my own material would also have aided me to process and practice what I had learned. In this way, the instructors, too, could have received more in-depth feedback, especially in those cases where their tools were still being updated and improved.
Nonetheless, the overall timing of the workshop suited me very well as we had the opportunity to connect with other participants during coffee breaks, lunch, and in the evenings. It was interesting to hear how other scholars at a similar career level were going to use topic modelling, stylometry, or network analysis in their projects, and I learned a lot about the institutional frameworks and digital cultures in other universities. Finally, the vivid keynote lecture delivered by Prof. Christof Schöch was a convenient occasion to sit back and reflect on some of the overarching challenges behind digital literary analysis. I am very grateful for the opportunity to attend the COST Action Training School and will recommend it to my peers.
https://www.distant-reading.net/wp-content/uploads/distant-reading_icon_v3a.png286301Justin Tonrahttps://www.distant-reading.net/wp-content/uploads/distantreading_logo-high-300x138.pngJustin Tonra2018-12-13 12:29:302019-10-12 13:21:48Methods & Tools: a Report on the Distant Reading Training School
This guest post was written by Action Member Dr Pieter Francois (WG1 & WG3), Associate Professor in Cultural Evolution at Oxford University.
Over the past six months I have been in extensive contact with Professor Tao Wang and his Digital History Centre at Nanjing University, China. Our mutual friend, Simon Mahoney (University College London), who knew I was keen to set up collaborations with digital humanists in China, put us in touch and told me that I simply had to speak to Tao. In November 2018 this resulted in a first visit to Nanjing University. I came back blown away by the quality and the level of commitment to Digital Humanities at Nanjing University. I returned home feeling invigorated and full of ideas and plans to deepen this promising collaboration.
Pieter Francois in China
The first day of my visit Tao had arranged for me to meet some of his closest colleagues at his Centre and get a hands-on introduction to a range of projects. Professor Gang Chen’s project on the Historical Geographic Information System for Six Dynasties is, for example, one of the finest examples of a full integration of archaeological, material culture and textual data that is made accessible through spatial querying. It is also an absolute labour of love for Gang Chen and his students. In the afternoon Tao had arranged a meeting with a number of his colleagues at the Humanities and Social Sciences Big Data Institute of Professor Lei Pei and Professor Jiang Li. Their work on tracking the global mobility of Chinese students and staff was very impressive.
The second day of my visit I gave an invited lecture to approximately 50 enthusiastic faculty and students of Digital Humanities at Nanjing University. My talk focused especially on our Cost Action project ‘Distant Reading for European Literary History’ and my ‘Seshat: Global History Databank’ project. After the talk we had a very lively discussion which we continued in the tea house afterwards. I especially managed to establish a real intellectual rapport with Professor Jing Chen. We left the tea house with a number of specific plans to introduce both projects to a wider audience in China. No doubt, this visit is only the first step in setting up meaningful collaborations between all research groups involved!
https://www.distant-reading.net/wp-content/uploads/Picture-Poster-Nanjing.png1265887Justin Tonrahttps://www.distant-reading.net/wp-content/uploads/distantreading_logo-high-300x138.pngJustin Tonra2018-11-29 15:51:212019-10-12 13:22:09Distant Reading in China
Following our recent Action meetings in Antwerp, WG2 member and Chief Content Architect at Wolters Kluwer Germany, Christian Dirschl, offered the following thoughts on our project from his perspective as an Information Scientist working in an industrial setting.
At the beginning of October, I participated in the meeting of Working Groups 2 and 3 in Antwerp. I am an Information Scientist who usually works on legal information and not literary texts, so I considered myself as an outsider to this group. Still, I joined WG2 and was very curious about how the digital humanities is dealing with the specific challenges it faces.
I felt very welcome! Both from the people at the meeting, but also from the discussions that were going on, which sounded quite familiar to me.
There were discussions about the balancing act of enriching documents by human experts versus automatically by machines. Another angle was about offering basic technological infrastructure or aiming at sophisticated and complex algorithms, which might not reach the maturity level that would be required in an operational environment. And then, there were open questions: whether to head for a single technology that serves all languages, or whether dedicated mono-lingual tools would be superior in the end—with the drawback that the results would hardly be comparable across the whole corpus.
Members of our Action bask in the Antwerp sunshine after three days of meetings last week.
My own experience with these technologies is very similar and obviously, there is no right or wrong answer. A complex challenge requires a complex solution—or a magic wand!
Although Deep Learning sometimes appears to be this wand, it was clear from the start that its application area in this Action is important, but limited. So, other solution streams also need to be investigated. I am looking very much forward to seeing what the final decision will be.
The Action has an interesting and ambitious goal and there were enough dedicated experts around the table to make sure that quite a lot will be achieved within the limited available resources.
What I have learned in the last five years or so is that technical progress needs to be aligned to customer needs, or rather, in this case, researchers’ requirements. And I have the impression that academia in general is still very much on an exploratory path. Most of the times, this will lead to more knowledge, but less applicability. So my advice is to spend quite some time on a regular basis on whether the intermediate results show progress on current (!) research requirements and not only in general and then to adapt to this feedback, so that an optimal practical solution is finally achieved. This may sound odd for some researchers, but in my experience this is the most efficient way to go forward.
I really enjoyed the two days in Antwerp and I am looking forward to further collaboration in the future. All the best to the Action and its participants!
https://www.distant-reading.net/wp-content/uploads/antwerp_group_pic.jpg30004000Justin Tonrahttps://www.distant-reading.net/wp-content/uploads/distantreading_logo-high-300x138.pngJustin Tonra2018-10-17 08:50:172019-10-12 13:22:32Information Science and Distant Reading: an Industry Perspective
Distant Reading is a COST Action funded by the Horizon 2020 Framework Programme of the EU.