Curation is often thought of as a passive activity. Once deposited, content is simply “preserved” into perpetuity. This couldn’t be further from the truth and Kevin Ashley, Director of the Digital Curation Center made the point deftly in his plenary talk at the end of day one of UKSG. If there was a person who could keep a group in rapt attention after a long day of sessions, it certainly would be Kevin.
Much like the curation of publications, active curation of research data is critical to its good stewardship. Curation implies active management and dealing with change, particularly technological change related to electronic information. Ashley made the point that while curation and preservation are linked, they are not synonymous activities. Curation is both slightly more and slightly less than preservation. Curation implies an active process of cutting, weeding and actively managing the content, and regularly deciding when things should be retired from the collection. Ashely also made the pint that there are benefits to good preservation management. It can generate increased impact, it can add a layer of accountability, and can address some legal requirements.
Interestingly, within the UK, while most of the Research Councils place expectations on data management policies on the researcher, the Engineering and Physical Sciences Research Councils (EPSRC) has begun putting the expectations onto the institutions, not on the PIs. (NOTE CORRECTION, applies only to EPSRC, not all RCs as originally noted). In part the UK’s system of educational funding allows for this type of central control on institutions. Each approach has its benefits, but from a curatorial perspective the institutional mandate focus, will likely ensure longer term and more sustainable environment for preservation. The current mandate in the UK is that data be securely preserved for a minimum of 10 years from the last use. Realistically, this is a useful approach for determining what is most valuable. If data are being re-used regularly, than curating it for the next 100 years or more is a good thing. Any content creator would hope for that type of success in the long-term continued use of their data.
The other aspect of data curation is actually to support the data’s eventual use. “Hidden data are wasted data,” Ashley proclaimed. Again, it is important to reflect on why we are preserving this information; for use and reuse. Which reinforces the need to actively encourage and manage digital data curation.
Particularly from a data sharing perspective, data are a more than an add-on to publication process, but it also poses some other challenges. One example Ashley described is that “Data are often living”, by which he means that data can frequently be updated or added to regularly, so the thing an institution is preserving is constantly changing. This poses technical problems as well as issues with metadata creation and preservation.
There are several projects ongoing related to scientific data curation, use and reuse. Those interested in more information, certainly should look to some of the reports that the DCC has published on What is digital Curation?, Persistent Identifiers, and Data Citation and Linking. There is also a great deal of work being undertaken by DataCite and the Dryad project. NISO and NFAIS are working on a project on how best to tie these supplemental materials to the articles to which they are related, one question this project is addressing is who in the scholarly communications community should be responsible for curation of these digital objects.
One might well reflect on one of the quotes that Asley began his presentation with:
“The future belongs to companies and people that turn data into products”
-- Mike Loukides, O’Reilly.
If this is really to be the case, ensuring those data are available for the long term will be a crucial element of that future.