It didn’t take long for data, the
lifeblood of research, to enter the conversation at UKSG. The future of data
management and publication was raised early and often by both delegates and
speakers.
The first morning, Geoffrey
Boulton of the University of Edinburgh and chair of the Royal Society working
group Science
As An Open Enterprise made a convincing case for the importance of open
data. Boulton reminded the assembled audience that laying open proof of your
experiments has been a tenet of the foundations of research since the 1800s,
but that in recent years this has exploded. Boulton published a paper in Nature in the 1980s which presented just
seven data points behind glaciological theory; nowadays a paper is just as
likely to have millions of data points sitting behind it.
Boulton posited that big data,
and data modelling offers huge opportunities to academics, but that to
capitalise on these opportunities properly we need a system of sharing. Often,
data isn’t shared due to concerns about privacy, safety, security, or for
legitimate commercial concerns, but Boulton argued that publishers and funders
should be mandating ‘intelligently open data’ and that libraries should be
re-skilling to meet this demand. It should be librarian’s role to help make
data discoverable and accessible, as part of a wider data ecosystem.
In a fascinating seminar, Ben
Ryan from the EPSRC talked through the reasoning behind the RCUK data
principles, the EPSRC
research data principles, and how they will be implemented practically. He
emphasised that the research councils see sharing data as a legitimate use of
research budgets, and that sharing data should be the default, whenever
possible.
Research organisations should
have the primary responsibility for ensuring researchers manage their data
effectively, but that it should be considered ‘research malpractice’ not to
make your data open – “We’ve gone past the days when scientists could be
trusted simply because they were scientists”, he said.
When it comes to publishing data, my colleague Iain Hrynaszkiewicz from Nature Publishing Group gave a
lightning talk charting the rise of the data journal. Scientific Data is one such journal[1]
which aims to incentivise researchers to share their data by providing a
citable output, linked to the original data stored in subject-specific
repositories or broad repositories such as figshare or Dryad. The Data
Descriptor (the article type published by Scientific
Data) was designed in collaboration with the academic community to make data more discoverable, interpretable and reusable. It ensures
that data isn’t forgotten and hidden away in the supplementary material to an
article, but is published for the world to see. Data Descriptors also aid in
reproducibility, ensuring that the methods for gathering data and conducting
research are laid open for others to potentially follow and recreate.
It’s often said that
open access is a journey, not a destination, and the same must be true of open
data. No talk about open access, discoverability or reproducibility could fail
to mention its importance, and no doubt librarians will have a growing role in
the years to come in educating their clients and in facilitating open data.
No comments:
Post a Comment