Curation is often thought of as a passive activity. Once deposited, content is simply
“preserved” into perpetuity. This
couldn’t be further from the truth and Kevin
Ashley, Director of the Digital Curation
Center made the point deftly in his plenary talk at the end of day one of
UKSG. If there was a person who
could keep a group in rapt attention after a long day of sessions, it
certainly would be Kevin.
Much like the curation of publications, active curation of research data
is critical to its good stewardship. Curation implies active management and dealing
with change, particularly technological change related to electronic information. Ashley made the point that while
curation and preservation are linked, they are not synonymous activities. Curation is both slightly more and
slightly less than preservation.
Curation implies an active process of cutting, weeding and actively
managing the content, and regularly deciding when things should be retired from
the collection. Ashely also made the pint that there are benefits to good preservation management. It
can generate increased impact, it can add a layer of accountability, and can
address some legal requirements.
Interestingly, within the UK, while most of the Research Councils place expectations on data management policies on the researcher, the Engineering and Physical Sciences Research Councils (EPSRC) has begun
putting the expectations onto the institutions, not on the PIs. (NOTE CORRECTION, applies only to EPSRC, not all RCs as originally noted). In part the UK’s system of educational funding allows for this
type of central control on institutions.
Each approach has its benefits, but from a curatorial perspective the institutional
mandate focus, will likely ensure longer term and more sustainable environment
for preservation. The current
mandate in the UK is that data be securely preserved for a minimum of 10 years
from the last use. Realistically,
this is a useful approach for determining what is most valuable. If data are being re-used regularly,
than curating it for the next 100 years or more is a good thing. Any content creator would hope for that
type of success in the long-term continued use of their data.
The other aspect of data curation is actually to support the
data’s eventual use. “Hidden data are
wasted data,” Ashley proclaimed.
Again, it is important to reflect on why we are preserving this
information; for use and reuse. Which
reinforces the need to actively encourage and manage digital data curation.
Particularly from a data sharing perspective, data are a
more than an add-on to publication process, but it also poses some other
challenges. One example Ashley
described is that “Data are often living”, by which he means that data can
frequently be updated or added to regularly, so the thing an institution is
preserving is constantly changing.
This poses technical problems as well as issues with metadata creation
and preservation.
There are several projects ongoing related to scientific data
curation, use and reuse. Those
interested in more information, certainly should look to some of the reports
that the DCC has published on What
is digital Curation?, Persistent
Identifiers, and Data
Citation and Linking. There is
also a great deal of work being undertaken by DataCite
and the Dryad project. NISO and NFAIS are working on a project on how best to
tie these supplemental materials
to the articles to which they are related, one question this project is
addressing is who in the scholarly communications community should be responsible
for curation of these digital objects.
One might well reflect on one of the quotes that Asley began
his presentation with:
“The future belongs to companies and people that turn data
into products”
-- Mike
Loukides, O’Reilly.
If this is really to be the case, ensuring those data are
available for the long term will be a crucial element of that future.
Todd
ReplyDeleteThanks for your summary of my talk, and particularly for getting it online so quickly. But there's one inaccuracy I would like to correct before it spreads any further.
Most UK research funders are behaving just like those in the US and placing compliance requiremsnts regarding research data on the PI. But there's one exception and it happens to be the largest research council - EPSRC. They are the ones placing requirements on institutions and specifying that data be kept for 10 years after last use.
And that action, by a single funder, has galvanised action at senior level in institutions in a way that other requirements placed on PIs have not.