LIBER Responds to PLS Survey on Text and Data Mining

Erroneous Claims Based on Scant Evidence: the PLS Survey on Text and Data Mining

Recently the Publishers’ Licencing Society (PLS) released their interpretation of the results of a survey on text and data mining (TDM) [1] “to identify the number and types of requests publishers are receiving and how they are being dealt with”. Admitting that “there is no available data on the full scale of text and data mining activity in the research community” and that there seems to be a low number of requests received, PLS state that their services and licences are facilitating TDM. However, there simply is no data to back this up.

Licences are facilitating TDM?

The survey presents absolutely no evidence that licences are facilitating TDM. Indeed the statement that publishers are increasingly including TDM provisions in their licenses is pure conjecture. A more interesting question to ask would have been how many publishers explicitly forbid TDM activity in their licence agreements?

On the UK exception for TDM

If the logic used to reach the conclusion that licences are facilitating TDM were applied to the interpretation of the data about the low number of requests for permission to carry out TDM from UK-based researchers, one could surmise that the lack of requests are an indication that the TDM exception in the UK is, in fact, working. Unlike licence agreements, which may or may not exist and can have differing terms, the UK exception provides clarity to researchers who are no longer required to ask permission from publishers to perform TDM on content to which they have legal access to. The number of requests received by publishers in 2015 should therefore only pertain to commercial activity, which is not covered by the exception.

Publisher responses to requests

In their analysis of responses, PLS claim that nearly all requests for TDM were granted. In fact of the 16 publishers who received requests, only 12 responded positively and even then that was only in 80-100% (whatever that means?) of cases. Using basic calculation this would result in an estimate that between 60-75% of requests are granted. This is far from “nearly all” requests. PLS also reported a “fast” response time of under two weeks. The definition of “fast” really depends on where you are standing. From the position of a researcher, working with high volumes of data and high performance computing technology, a two week delay in starting work is an eternity.

Low number of requests

It should be noted that the survey was conducted by UK organisations and therefore may not reflect the situation across the EU. However, the low number of requests for permission to perform TMD could easily be an indication of any combination of the following:

  • The chilling effect of having to ask permission – at EU level researchers have largely asserted the importance of e-Science but may be dissuaded from performing TDM because of bureaucracy
  • Lack of clarity around the legality of TDM (is it necessary to ask permission?)
  • Researchers are mining content without alerting publishers

The bigger picture

Fundamentally, this survey ignores the bigger picture which is that TDM will enable data analysis at scale, across disciplines and formats. The content of academic journals is only a small fraction of the content that could be mined. This will include databases, blogs, digitised cultural heritage, video clips, voice recordings: the whole of the open Web. No licence or combinations of licences can ever facilitate the true potential scale of TDM.

Copyright reform debate

With this legislature being set for heated EU policy debate on copyright and TDM, we are disappointed by the sort of misinformation presented in the PLS analysis. It would be constructive if the organisations who conducted this survey made the data openly available so that policy makers can carry out their own unbiased analysis.

LIBER looks forward to an informed discussion over the coming months and is happy to provide further information based on the experiences of its 400+ members plus the in-depth conversations that we have had with researchers from various disciplines who are engaged in TDM.

Release date 03 September 2015

