Tuesday, 27 March 2012

Research data preservation projects meeting.

"It's a question of discipline," the little prince told me later on. "When you've finished washing and dressing each morning, you must tend your planet." ~Antoine de Saint-Exupéry, The Little Prince, 1943.

We had a JISC Research data preservation projects meeting on Thursday 22 March. We had attendees from Cambridge, LSE, Bristol, and of course ULCC. Each project updated on its findings and progress including briefing on past and current training provision. Each project summarised the findings of their user survey. Although we had approached this in different ways (structured interviews, workshop, on-line questionnaire) our findings were remarkably similar. Even Bristol/DataSafe, which is concentrating on support staff and records data preservation, found resonances.

An important point which we see emerging again and again is the fact (and Cambridge, LSE and ULCC had all found this in our research) the phrase “research data” was not recognised by most researchers especially in the arts and humanities. They simply could not relate to the term. “Data” implies science and structural data and in large amounts. Whereas we are using it to define all and any information in any form for research purposes. LSE has already adopted “research material and data” as a catch all phrase and is a more accessible term. ULCC have not used the term metedata in relation to training as we consider it an alienating term.

We looked at comparing the identified needs of trainee community. There was much discussion about the attitudinal aspects of managing research data. People despair often of the lack of appreciation of the value of research data and why it should be kept. I personally think that we, the infromation managers, must share responsibility for this. Terms such as 'data' and 'metadata' for example are meaningless and alienating for most people not involved in the management of information. In a way we need to address our attitude to working with the research community. We have to develop ways of tailoring our approach/language to the non information management community. In a sense what I feel we need is to tend to ourselves first in order to make sure that we communicate effectively with the outside world.

Otherwise I noted that people are simply not getting the advice or assistance from their instiutions who are fostering their research. hence the value of our projects if we pitch our advice correctly. Very often there are no guidelines on the management of researc data avilable so researchers are very much left to their own devices. This can be demonstrated by their storage solutions as almost everyone we interviewed uses the cloud in one way or another. The exceptions were the few who knew the risks of the cloud (or who actually read the tems and conditions).

Issues in preservation skills included: choosing and using appropriate file formats; incorporating data preservation into their project; working with repository criteria for research data deposit.

We agreed that no one method of delivery or approach would suit all our target audiences, but having material that could be re-purposed for several modes (e.g. group training and on-line learning) would be the best tactic. Furthermore, all projects are constrained in what they will produce by their project scopes and institution-specific requirements. We did, though, identify several areas where collaboration would be mutually beneficial, so we agreed the following joint action:
Cambridge will set up a wiki to enable us to develop firstly a structure and set of questions for a FAQ, then secondly to develop where possible generic answers to these questions, accepting that some will need to be tailored for each institution;

LSE will develop and design a top-level brochure about research data preservation containing the core points and links to further information. This will be adapted from the similar-but-independent 4-point structures proposed by ULCC and Cambridge, namely: Explain it – Store it Safely – Share it – Start Early. And as the little prince said it is all a question of discipline but communicate the 'why bother' effectively and it will be a less bitter pill to swallow with remarkably beneficial results.

So, a good get together. More later.

Tuesday, 20 March 2012

Thomas Hobbes and the preservation of research data.

Thomas Hobbes had a bleak view of humanity to put it mildly.
He considered that the state of nature - competing desires amongst essentially equal human beings for the limited supplies, generate conflict and, in Hobbes' most famous phrase, the life of man is 'solitary, poor, nasty, brutish and short'.

Let's say we apply this to data management and why not? I am not a philosopher but if his idea is true then we would think that people are not interested in sharing resources or thinking beyond their immediate desires and needs. This research data is mine, hands off!

I am pessimistic at the best of times but after running our training on the preservation of research data entitled 'What's in it for me?', I felt less so by the end of the day. It seemed that people do want to share their research data after publication, as they want to enhance existing research and contribute to the body of work which is essential to the understanding of the thought processes involved in research output. And yes there is the unaltruistic side to us all, a bit of appealing to the immediate desires and needs as ultimately sharing your research data will enhance your standing in the community of expertise if it is well and often cited.

The premise of our training on March 14th was to lure folk in to speak about their experiences of preserving research data in the course of their research while we learnt a whole lot from them and what they need so we can best plan and design an online course on this for the great History Spot site at IHR. Our cohort of people attending our training day came from a variety of research backgrounds and made it a rich day for information gathering about their needs and 'desires'.

'I lost my data in a USB key which fell into a cup of coffee' - Anonymous.

So why did people come to our workshop? People spoke about various drivers which brought them to us. Experiencing the loss of data seems to sharpen the mind somewhat when it comes to preservation of data. People also spoke about being 'swamped with data and the information overload', wanting to take care of the material they had gathered over the years and worried they might loose it. Language struck a chord with many around the table. a lot of people don't use the word 'data' to describe their research material. The term 'data' is regarded as scientific and as a result people in the Humanities ofen feel alienated.We also reflected on the project so far, the knowledge base which we are gathering based on legacy data assessment and interviews.

The good, the bad and the ugly of research data

We thought that it would be good to show them examples of what I had found in the assessments and what I had heard in the interviews. Feedback was without exception good for the whole day and people seemed to take to this particular session! It demonstrated a variety of practical examples of documentation for research data from well documented examples to inadequate to nothing. Lack of documentation about research data is a severe inhibitor to allowing access to it in the future. If the researcher does not write down information both descriptive and technical about the data we will loose the capability to access it both intellectually and literally. Lack of safe storage was another point, people often didn't back up and relied heavily on the cloud for storage not really knowing what they were agreeing to when they signed the terms and conditions of cloud services. However some were well advanced in good storage solutions and backups and used good formats for preservation and consideration of how to future proof their material.

Intellectual Property Rights (IPR) rears its inevitable head and as Kit Good has rightly pointed out Data Protection and Freedom of Information can affect research data. Some people had data on living individuals and this would have implications in relation to data protection. Many people interviewed simply did not remember what permissions they had regarding use of the primary material they had copied or recorded. They had signed a piece of paper in the library or archive but didn't remember what it said. As a result they would not be able to share this data in the future as copyright and usage was not clear.

Four good ideas

We gave an overview of Four things which they could all do to enhance the preservation of their research data. Here are the main ideas for each of which we gave practical solutions.

1. Write everything down.
2. Store your data safely.
3. Interventions are needed, the earlier the better!
4. Consider sharing, the why and how.

Golden Opportunities

A vital part of the workshop had to be participation. we really needed to find out what these delegates thought about the preservation of research data. We gave them six opportunities. These opportunities allowed everyone time to work alone or in pairs to think about various aspects of digital preservation. This was done using the innovative pen and paper method. Everyone had a chance this way to express their opinion as we made them do so! We then wrote up all this feedback and presented it all back to them for review at the end of the afternoon.

These questions included:

1. Why bother keeping research data?
2. What are the risks of not keeping your research data?
3. Give us your examples of good and bad practise
4. What are your storage needs?
5. If you could have a single magic tool to do this, what would it be?
6. Are you comfortable with sharing your data at any time? If yes, why and if no, why?

We got tremendous answers which will guide us while developing our on line course.

What feedback did we get?

The feedback was either 'good' or 'excellent'. What pleases us more though are comments:

'More information on non microsoft software, I use Mac OS and open source which is often space hungry to me.' Patricia Croot

'Very well communicated and outlined, whole presentation extremely helpful and well organised. Look forward to more detailed training'...

'Selection of material for preservation will be necessary, I think, as every researcher generates so much data that preserving it all will be a full time job in itself' Pernille Richards.

Thanks to everyone for such a good afternoon and now to work honing our moodle skills!

Thursday, 8 March 2012

Personal data, public information - Research data and information law

The SHARD project is looking at the preservation of research data for the traditional requirements of peer review, re-use and retention of digital assets. I have been asked to briefly cover – and it’s the ‘briefly’ that’s the challenge – for the project blog the subject of ‘access to information’ legislation and how it relates to the management of research data. Preserving your research data also has a legal context that is worthy of serious consideration.

What do I mean by ‘access to information’? Let’s get the acronyms established early on for three pieces of legislation: The first is the Data Protection Act 1998 (DPA), which is concerned with the ‘personal data’ of living identifiable individuals. The other two – the Freedom of Information Act 2000 (FOIA) and the Environment Information Regulations 2004 (EIR) – are concerned with ‘public’ information held by ‘public authorities’. Research data can be covered by all three.

Many researchers are looking at these issues already. Data management plans are a routine requirement for many research funding bodies. If there is no data management plan available for your research use one of the available templates provided by your institution or organisations such as JISC and the Digital Curation Centre.

Personal data

Research data that contains reference to living individuals – interview scripts, contact details, even statistical information relating to small numbers of individuals etc. – should be managed according to the eight principles of the DPA. I won’t go into too much detail about this here, as there is so much guidance already available, suffice to say that the following should be considered:

  • Do the individuals identified in your research data know how and for what purpose their data is being held? Have they given their consent?

  • Is there provision to store the personal data safely and securely?

  • How long are you planning to hold the personal data for? If the answer is ‘forever’, can you anonymise it and still retain its value?
If you are unsure, do ask your institution’s Data Protection Officer or similar information compliance contact. They will be keen to help. The Information Commissioner’s Office (ICO), the UK’s information and privacy regulator, now has enforcement powers to fine organisations up to £500,000 for the loss or unauthorised access of personal data. The rigour of a data management plan is therefore vital not just in protecting your research project, but your institution as a whole.

Freedom of Information

Since 2005, all organisations defined as ‘public authorities’ in England, Wales and Northern Ireland are subject to the Freedom of Information Act 2000 (FOIA). In Scotland they follow the similar (with at least one important difference for research data, as we will see) Freedom of Information (Scotland) Act 2002 (FOISA). The crux of the Act is that the public has a right of access to information ‘held’ by public authorities. If asked for information, the authority has to confirm that it is held and provide it, unless a legal exemption applies. The Environment Information Regulations 2004 provides a right of access to ‘environmental information’ under similar timescales and some slight differences in detail to FOIA but, for the purposes of this blog, my statements should generally cover both.

Universities are defined as public authorities by the Act and therefore obliged to respond to FOIA requests. This is not always as simple as it sounds, in that unlike many other public authorities, Universities operate in a competitive, increasingly international environment with an ever-decreasing proportion of public funding. More nuanced still is the relationship of the individual academic with ‘their’ research data, produced in everything from solitary sabbatical study to global partnerships of research institutions. At the same time, there is a significant ‘open access’ movement in academia which is arguing for the pro-active publication of research data through online journals and repositories.

FOIA and EIR requests have been made for research data and in some cases have required the Information Commissioner’s Office (ICO) to issue a ‘Decision Notice’ in order to ensure disclosure. Queen’s University Belfast were ordered by the ICO to release over 40 years of research data on tree rings, used for climate research (see the news item) under the EIR legislation.

There are, however, several exemptions in the FOI Act that can apply to research data requests: Section 22 ‘Intended for future publication’ allows a University to exempt information that will be later published. Section 43 ‘Commercial Interests’, exempt the disclosure of information which could prejudice the commercial interests of the University or another party, such as a partner institution or research funding body. If your research data contains personal data, then parts of it are likely to be exempt under Section 40 ‘Personal Information’. FOISA includes a specific research data exemption - Section 27(2) – but even so this derives from the general principle of ‘intended for future publication’ and is unlikely to prevent disclosure of research data held in the manner of the ‘tree ring’ dataset.

It is definitely worth reading the ICO’s guidance for the Higher Education sector around FOIA.

Once again, if you are unsure, do ask your institution’s Freedom of Information Officer or similar information compliance contact. Try and envisage in your data management plan how you would deal with a request for your research data. It may be that public disclosure of research data is a desired outcome of the project; it may require some serious consideration and discussion amongst the research team.


Access to information legislation in the UK can apply to research data. This can have important implications for a research project and therefore acts as another driver for ensuring that your data is managed and preserved effectively. Ensure that you create a data management plan when starting on a new project and discuss any issues with the FOI/DPA compliance officers at your institution.

Monday, 5 March 2012

SHARD workshop

Preservation and research data: what's in it for me?

Event type:
14 March 2012

Registration for this workshop is now CLOSED as we have reached capacity

All researchers create research data, from the database you've set up for your PhD, to your extensive collection of references, to more complex and extensive projects.

As researchers you have two choices:

A. Do you want to ensure that your research data gets the attention and validation it deserves over time? Do you want to make sure that it is safe, accessible and shareable as a valued resource for as long as possible?

B. Or would you rather it disappeared, became corrupted and was never used again with little or no opportunity for reuse and academic recognition of your research?

If you choose the former then we strongly recommend that you attend this FREE workshop.

We will cover issues such as why researchers need to bother about best practice in relation to managing their research data to ensure access over time.

  • Case studies: The good, the bad and the ugly of the research data in relation to ensuring that it is accessible over time.
  • Quick wins: Some simple things you can do to make sure your data lasts
  • Tools: Things that can do these things.

The workshop will be held on 14 March 2012, from 2.00 to 4.30.

Event Location:
Stewart House (Room ST265)
University of London 32 Russell Square
London WC1B 5DN
United Kingdom