Tuesday 27 March 2012

Defining value: putting dumb numbers to work

Grace Baynes, Nature Publishing Group leading a group discussion on use (and abuse) of analytics, with a good mixed group of librarians, publishers and one precious researcher.

What do we mean by value?
We're focussing today on the *relative* worth, utility or importance of something - numbers by themselves aren't that valuable; we need a context. We can measure downloads, time spent, social media influence, but just because we can measure something doesn't mean it is helpful or valuable to do so. We need to become more refined in how we are applying metrics.

What do we want to know the value of?
We talk primarily about journal value, but article-level metrics are increasingly important, as is the value of an individual researcher.

What indicators can we use to measure value?
Impact factor, cost, return, usage, meeting demand (qualitatively assessed). Usage breaks down in a number of ways and combines with other data eg to calculate cost per use - but what represents good value? Does it vary from field to field? How do you incorporate value judgements about the nature of the usage? Nature doing some preliminary research with Thomson Reuters here, looking at local cost per citation (ie comparing usage within institution to citations of those articles by authors within those institutions), in comparison to competing journals (Science, Cell, PLOS Biology). Picked some of the leading institutions in the US, and also looked at numbers of authors, number of citations across key journals. Grace throwing out to librarians - is this interesting? Would this data be useful in evaluating your collections?

Moved on to discuss the Eigenfactor - a Google PageRank for journals? Combining impact factor / citations and usage data in a complex algorithm.

Peer review - F1000's expert evaluations being turned into a ranking system.
what about truly social metrics e g Altmetrics (explore free demo from Digital Science, looking at tweets, blogs, reddits etc). Also ref Symplectic and SciVal as examples of visualising Analytics data.

Questions: as we move to OA, cost per use less important - what metrics will become more important? E.g. Speed to publication? BioMedCentral publish this for each article. How would it be translated into an easy to measure value?

Are all downloads equally valuable? OUP did some research into this; good articles would get double downloads (initially viewed in HTML, then downloaded as PDF if useful - so that conversion is one good indicator of actual value. Likewise, if people who *could* download full text but didn't bother, having read the abstract, that's a potential indicator of non-value).
But, this approximator of value becomes less reliable as the PDF becomes less popular as a format. Assuming that a download is more valuable if it leads to a citation - flawed - what about teaching value? Point of care use? Local citation gives a flawed picture.

The library experience
Big institutions have to use crude usage metrics to inform collection development, because more detailed work (as reported by this morning's plenary speaker Anne Murphy) is not viable at scale. But librarians know that an undergraduate download is less valuable than a postgrad download, in the sense of how important that precise article is to the reader. Citation too doesn't equal value - did they really read, understand, develop as a result of reading that article?

Ask users for reason that they're requiring ILL: what are you using this for? Fascinating insight into different ways that content is valued. Example from healthcare: practitioners delaying treatment until they can consult specific article. "We're sitting on a gold mine" - value of access to information services - showing impact. [Perhaps useful for publishers to try and capture this type of insight too - exit overlay surveys on journal websites, perhaps?]

Role of reading lists - can we "downgrade" usage where we know it has been heavily influenced by a reading list? Need to integrate reading lists and usage better.

As well as looking at usage and citations, there's a middle layer - social bookmarking sites and commentary can also indicate use / value and are much more immediate to the action of using the article than the citation.

What impact will changing authentication systems have? Will Shibboleth help us break down usage by user type? This is what Raptor project does - sifting Shib / EZProxy logs to identify users, but reliant on the service provider having maintained the identifiers and passing them back to the institution with usage data. Current bid in to JISC to combine JUSP with Raptor - agreement from the audience that this would be *HUGELY USEFUL* (hint, hint, please, JISC!)

Do people have the time to use the metrics available? One delegate recommends Tableau instead of Excel to analyse data - better dashboards.

Abuse of metrics: ref again to problem with impact factor using mean rather than median, and examples of when that has caused problems (a sudden leap to an impact factor in the thousand thanks to one popular article). Impact factor also cannot cope when bad articles are cited a lot because they're bad - not all citations are equal.

"Numbers are dumb unless you use them intelligently," says Grace. "We need to spend less time collecting the data, and more time assessing what it means."

No comments:

Post a Comment