Monday 26 March 2012

Digging for the unknown know - If you have too much to read can a computer do it for you?

Breakout session with Eeke Smit and Maurits van de Graaf presenting highlights of the PRC study on text mining

Some highlights from the survey:

Requests from 3rd parties

77% of publishers who responded to the survey receive mining requests, but this is a low number per year – only 21% receive more than 10 requests per year

Requests from corporate customer and Abstract and Indexing companies are the most frequent.

Most who don’t receive mining requests are Open Access publishers or very small publishers.

Of those who receive requests 32%, mostly OA publishers, grant permission without restrictions. The rest consider the requests on a case by case depending on the purpose of the mining.

53% of publishers decline requests that would create products that compete or replace their original offerings.

31% of publishers ask for a fee if for commercial purposes.

Publishers mining their own data

46% publishers surveyed presently undertake content mining on own content. They are doing this for a number of reasons:

  • Improve retrieval of content
  • To generate better metadata and to add semantic tagging
  • To create new products

Of the 54% who don’t currently mine their own content, 36% plan to start mining in the next year.

Dilemma for publishers

On the plus side allowing text mining requests from 3rd parties can drive traffic to their content, but on the other down side there is potential that the outputs of text mining could function as a substitute to the original content. Finding the right balance is key.

Obstacles and solutions?

Lots of discussion from the publishers, libraries and technology vendors in the session about what the obstacles are and what the solutions might be.

Licensing and copyright issues were a big concern. Would sample licences help? STM have created a sample licence to try and prevent everyone knitting their own jumpers (to borrow a phrase from Stephen Abram's session this morning). Pharmaceutical companies are also developing their own model licences.

Lots of discussion about whether an aggregated solution might be the answer. Is there a role for discovery services such as Primo or Summon (other discovery services are available!) in providing an aggregated text mining service; they are already aggregating lot of data and have the relationships with publisher and libraries, could this be expanded on. For many, especially pharmaceutical companies, this wouldn't be comprehensive enough as the value comes from also being able to include their own locally hosted, proprietary/commercially sensitive data.

Interesting session prompting lots of food for thought!


No comments:

Post a Comment