Tuesday, 27 March 2012

Kevin Ashley on the Curation of Digital Data

Curation is often thought of as a passive activity.  Once deposited, content is simply “preserved” into perpetuity.  This couldn’t be further from the truth and Kevin Ashley, Director of the Digital Curation Center made the point deftly in his plenary talk at the end of day one of UKSG.  If there was a person who could keep a group in rapt attention after a long day of sessions, it certainly would be Kevin.

Much like the curation of publications, active curation of research data is critical to its good stewardship. Curation implies active management and dealing with change, particularly technological change related to electronic information.  Ashley made the point that while curation and preservation are linked, they are not synonymous activities.  Curation is both slightly more and slightly less than preservation.  Curation implies an active process of cutting, weeding and actively managing the content, and regularly deciding when things should be retired from the collection. Ashely also made the pint that there are benefits to good preservation management.  It can generate increased impact, it can add a layer of accountability, and can address some legal requirements.
The DCC Curation Life Cycle model during Ashley's presentation

Interestingly, within the UK, while most of the Research Councils place expectations on data management policies on the researcher, the Engineering and Physical Sciences Research Councils (EPSRC) has begun putting the expectations onto the institutions, not on the PIs.  (NOTE CORRECTION, applies only to EPSRC, not all RCs as originally noted). In part the UK’s system of educational funding allows for this type of central control on institutions.  Each approach has its benefits, but from a curatorial perspective the institutional mandate focus, will likely ensure longer term and more sustainable environment for preservation.  The current mandate in the UK is that data be securely preserved for a minimum of 10 years from the last use.  Realistically, this is a useful approach for determining what is most valuable.  If data are being re-used regularly, than curating it for the next 100 years or more is a good thing.  Any content creator would hope for that type of success in the long-term continued use of their data.

The other aspect of data curation is actually to support the data’s eventual use.  “Hidden data are wasted data,” Ashley proclaimed.  Again, it is important to reflect on why we are preserving this information; for use and reuse.  Which reinforces the need to actively encourage and manage digital data curation.

Particularly from a data sharing perspective, data are a more than an add-on to publication process, but it also poses some other challenges.  One example Ashley described is that “Data are often living”, by which he means that data can frequently be updated or added to regularly, so the thing an institution is preserving is constantly changing.  This poses technical problems as well as issues with metadata creation and preservation.

There are several projects ongoing related to scientific data curation, use and reuse.  Those interested in more information, certainly should look to some of the reports that the DCC has published on What is digital Curation?,  Persistent Identifiers, and Data Citation and Linking.  There is also a great deal of work being undertaken by DataCite and the Dryad project.  NISO and NFAIS are working on a project on how best to tie these supplemental materials to the articles to which they are related, one question this project is addressing is who in the scholarly communications community should be responsible for curation of these digital objects.

One might well reflect on one of the quotes that Asley began his presentation with:
 “The future belongs to companies and people that turn data into products”
-- Mike Loukides, O’Reilly. 
If this is really to be the case, ensuring those data are available for the long term will be a crucial element of that future.

1 comment:

  1. Todd
    Thanks for your summary of my talk, and particularly for getting it online so quickly. But there's one inaccuracy I would like to correct before it spreads any further.

    Most UK research funders are behaving just like those in the US and placing compliance requiremsnts regarding research data on the PI. But there's one exception and it happens to be the largest research council - EPSRC. They are the ones placing requirements on institutions and specifying that data be kept for 10 years after last use.

    And that action, by a single funder, has galvanised action at senior level in institutions in a way that other requirements placed on PIs have not.