Text and Data Mining: Challenges and solutions from the publishers’ perspective

Posted: 03-12-2015 Topics: TDM

On 11 November, OpenMinTeD (a project in which LIBER participates) and Europeana organised a workshop titled ‘Text and Data Mining in Europe: Challenges and Action’. The goal of the workshop was to bring together content providers (publishers, data centers, museums and libraries) who are open to making their data available for Text and Data Mining (TDM).

Among the participants were a number of publishers, including Wiley, Ubiquity Press, Frontiers, Cross Ref, Wikipedia and Copernicus. All the participants at the workshop were fully aware of TDM’s great potential for society.

The workshop was divided into three interactive sessions on “Positives”, “Negatives” and “Next steps”.

Positives: TDM Best Practices & Visions

The first session was a best practice session, in which Richard Eckart de Castilho (Technical University Darmstadt) and Lotte Wilms (Royal Library the Hague) presented what their organisations are doing to make text and data mining possible. In the following interactive session, the participants expressed their visions on why they want to make their data available for TDM.

Negatives: Challenges Encountered

The second session was a “challenges session”, in which Max Kaiser (National Library of Austria) and Lucie Guibault (Information Law, University of Amsterdam) presented the challenges in their areas of expertise. In the interactive session that followed, the participants identified the technical, organisational, legal and various challenges they encounter that block them from their TDM vision:

  • The technical challenges identified are the quality of datasets, and the lack of a proper,secure infrastructure.
  • The organisational challenges identified include the internal  lack of skills and resources, and the internal resistance by managers who see risks. Externally, it is difficult that there is a fragmentation of communities and funding bodies, and the different stakeholders need to be brought together.
  • The legal challenges identified include the absence of standard licenses, the confusion of researchers on what is legal and what is not, and that there is no Europe-wide harmonized law on text and data mining.
  • There were various other challenges mentioned, such as the need to make stakeholders more aware of the opportunities and benefits of text and data mining.

Next Steps: Solutions to Overcome the Identified Challenges

In the third session, the participants came up with solutions to overcome these challenges:

1) Improve and align the quality of data

  • Develop and use open standards
  • Develop a definition of templates for metadata and content
  • Allow for peer review of data quality, develop validation tools, appraise good quality data
  • Organisations should invest human resources and money to improve the quality of their data

2) Convince stakeholders

  • Convince research funders: greater engagement with research funder to emphasize the importance of research outputs involving text and data mining
  • Convince policy makers: demonstrate the viability of business models and market opportunities
  • Convince policy makers: make sure TDM is included in the discussions around open science and open access infrastructure
  • Convince publishers: make publishers (small and big) realise that their data have great societal value and that TDM can bring this value to the surface
  • To all stakeholders: showcase TDM success stories to raise EU-wide awareness

3) Develop sustainable services

  • Build on current infrastructures: collect, describe and classify what is already out there, perform a gap-analysis, promote collaboration and define business models which will be used after finished EU projects
  • Use and publish open source software, use open standards and a sound business model, make it scalable, federative and keep fixed costs low
  • Ask the community, work on technical case studies and reach a consensus on core functionalities and services

4) Legal challenges in short term and long term

  • Get more involved in current European copyright reform discussions
  • Sign The Hague Declaration
  • Highlight the readiness of OA publisher to support TDM standards
  • Advocate for the use of a limited set of licensing schemas (use creative commons)

Valued Outcome

The outcomes of the workshop are very valuable to the OpenMinTeD project and its sister project FutureTDM, as the outcomes provide an excellent overview of the state of play of TDM opportunities in Europe from the content providers’ perspective. The OpenMinTeD project will take the outcomes of this and future stakeholder workshops into consideration in its proceedings and pay specific attention to the technical challenges and solutions.

In the survey that was held after the workshop, people expressed to have made useful new contacts, to have learned new things about text and data mining, and that they liked the productive, relaxed working atmosphere.

Wish You’d Been There?

This workshop was the first of a series of OpenMinTeD stakeholder workshops. We hope to see you at the next OpenMinTeD Text and Data Mining workhop at the LREC conference in May 2016. In the meantime, you can follow OpenMinTeD on Twitter and read the regularly updated OpenMinTeD blog!

You can find the presentations that were held at this workshop on OpenMinTeD’s Slideshare account.

For further questions about OpenMinTeD workshops, contact LIBER.