Data-driven research has become increasingly common, which results in researchers having questions about what to do with their data. Federal grants in particular have requirements for making data accessible, and other funders and groups have similar rules in place. Even if access to data is not mandated by the terms of a grant or institution, a researcher may still want to consider sharing their data. As files and formats change over time, ensuring the long-term survival of research data is also a concern.
Creating a Data Management Plan
The first step to long-term data management is to create a data management plan. A librarian may be able to help you with this. You'll need to think about your data in order to determine the best format and method for storing it. For example, what is the context of your data? What format is it in? How could it be used in the future? Is there sensitive/confidential information in your data? There are many other considerations in creating a data management plan as well - all of them can inform you and anyone working with you on how best to keep your data. If you'd like to read more on data management plans or see examples, check out DMPTool.
Storing Research Data
Once you have a plan, you'll have a better idea of where and how to store your data. At Ithaca College, our Digital Commons is able to host data sets. Data on the Digital Commons is search-engine optimized, shared with the larger Digital Commons community, and easily accessible. The Digital Commons also tracks usage, so you will be able to tell how often people view your data. You can see an example data set on Digital Commons to see if that would work for you. If you prefer to go with another option, there are many available online. Sites like re3data.org can help you find other data repositories.
If you plan to make your data open, either by choice or due to a funding mandate, you may want to take a few steps to ensure your data is usable and understandable to others. Prepping your data doesn't mean altering it - you should always be sharing your raw data. However, there are some things you can do to increase its value. You can make a data set more usable by:
For more on making data usable, check out "Some simple guidelines for effective data management" (Borer et al, 2009).
Creating a Data Management Plan
Data Management Plan Resources and Examples
Guidelines for Effective Data Management Plans
Prepping your Data
Open Refine (formerly Google Refine)
An open source tool for data transformation and cleanup.
Data and metadata conversion tools to prepare your data for publication.
Analyzing your Data
Free statistical analysis program - with a bit of a learning curve.
Statistical analysis program available on library computers.
Free data visualization software. Allows you to create interactive, embedd-able graphics.
A fairly simple web-based application for creating and customizing data visualizations.
Open source software to create interactive, data-rich websites. Best for location-related datasets.
There are many, many sources of open data on the internet, but finding the type and level of data set you're looking for can be a challenge. Here are some suggestions of places to start:
DataOne is a data repository containing biological and environmental data sets. DataOne is easy to search and use.
Data.gov is a good place to go for data from the federal government. Contains lots of public data, although it's not always in the most usable format.
DataDryad provides free data sets and other educational resources. Primarily biology data.
Many colleges and universities host data sets in their institutional repositories. It's a little more work, but looking in some IRs might lead to great finds. You can check out Michigan's Deep Blue Repository to see an example.
Ithaca College Library guide to Statistics and Data Sets