The workshop was based around three types of activities:
- Presentations of the project’s activities, aimed at sharing best-practice techniques.
- Buzz Groups, where participants could discuss the presentations and write down questions for the panel discussion.
- A panel discussion, where participants had a chance to pose questions to the project’s partners.
Alastair Dunning, Clemens Neudecker and Stefan Pletschacher preparing for panel discussion.
Presentations Marieke Willems from LIBER opened the workshop with an introduction to Europeana Newspapers and the structure of the workshop.
This was followed by Alastair Dunning from The European Library. His presentation “Surveying newspaper digitisation in European libraries, then aggregating them!” looked at the survey that was conducted among libraries on the extent of newspaper digitisation. One of the key points revealed by the survey was the fact that only 26% of the libraries have digitised more than 10% of their newspaper content. The survey is now being re-conducted, to allow more libraries to share their experience with newspaper digitisation. Add your voice.
The third speaker was Clemens Neudecker from the National Library of the Netherlands. He presented and explained many technical aspects of the project, including the complex refinement processes being used, the tools that were created specifically for the project and the development of Named Entity Recognition (NER) in four languages for newspaper content.
Stefan Pletschacher from the University of Salford spoke about “Digitisation Workflows and Evaluation Approaches”. In order to create a good user experience, Pletschacher said it was critical to have a clear idea of difference use-case scenarios and the how various features such as layout analysis, reading order detection and text recognition would be used. The final workshop speaker was Günter Mühlberger from the University of Innsbruck. He spoke about the ENMAP metadata profile being used by the project. A public version of this profile will be available in October. Mühlberger said that ENMAP provided a practical solution for coping with different data formats, and that this was important for a project aiming to create and make vast amounts of digital data accessible. He finished his presentation with food for thought on structural metadata: what is a headline, an advertisement, supplement or an opinion section?
Panel discussion The second part of the workshop was moderated by Alastair Dunning. On the panel were two of the speakers – Clemens Neudecker and Stefan Pletschacher – along with Birgit Seiderer (Bavarian State Library) and Tomas Foltyn (National Library of the Czech Republic). The discussion started with the question: “Do you give the institutions a copy of the re-processed data?”. Neudecker confirmed that this was the case, and added that the workflow of the project also allowed libraries to evaluate old and new results after processing.
Other topics raised during the discussion included:
- Named Entity Recognition (NER) – Challenges were discussed, including historical spelling variations and the occurrence of different spellings between various source materials. The Europeana Newspapers Project is developing NER training material in four languages. It was also noted that the resources of the project IMPACT could be helpful and that the National Library of the Czech Republic was supporting the development of databases for historical Czech. All panellists stated that the extending of NER for other languages would be interesting.
- Crowd Sourcing – Several questions came up concerning crowd sourcing that includes user features. Pletschacher said one study had shown that user-feature systems are often not appealing or user-friendly. Dunning disagreed and mentioned two examples from Australia http://trove.nla.gov.au/general/participating-in-digitised-newspapers-faq/and the UK http://blogs.ucl.ac.uk/transcribe-bentham/ that worked very well. Foltyn spoke about his idea to establish a step-by-step method of correcting errors in digitised content by way of crowd sourcing. The discussion ended with Dunning’s observation that niche sourcing (a more targeted type of crowd sourcing) was also a possibility.
- Metadata and Zoning – Different approaches of defining structural metadata. In the case of newspapers, it was felt that the structure and layout characteristics affected the meaning and perception of the text. These issues were still being discussed and defined.
- User Behaviour – A delegate from Latvia stated that some users were reluctant to consult the digital version of a newspaper. Seiderer said that some researchers needed to consult physical newspapers because they were looking for personal notes, or examining the type of ink and paper used. In other words, they were looking for more than the historical content in the newspaper.
The final question concerned the difficult field of copyright and personal rights for 20th century newspapers. It was noted that the issue of rights is handled very differently across Europe, and that examples from Norway and Switzerland show that newspaper publishers are ready to cooperate with libraries to make more recent content available. Further workshops The Europeana Newspapers Project will hold two more workshops, where you can learn more about the work we are doing with digital newspapers.
- “Aggregation and Presentation” at the joint The European Library conference “Improving innovation in Europe” on September 16th in Amsterdam. http://www.eventbrite.nl/org/3891830439?s=14727265
- “European Newspapers and the Digital Agenda for Europe”, on September 29-30th at the British Library in London.