This section breaks down different topics required for the planning and preparation of data used in research at Case Western Reserve University. In this phase you should understand the research being conducted, the type and methods used for collecting data, the methods used to prepare and analyze the data, addressing budgets and resources required, and have a sound understanding of how you will manage data activities during your research project.
Many federal sponsors of Case Western Reserve funded research have required data sharing plans in research proposals since 2003. As of Jan. 25, 2023, the National Institutes of Health has revised its data management and sharing requirements.
This website is designed to provide basic information and best practices to seasoned and new investigators as well as detailed guidance for adhering to the revised NIH policy.
Basics of Research Data Management
Research data management (RDM) comprises a set of best practices that include file organization, documentation, storage, backup, security, preservation, and sharing, which affords researchers the ability to more quickly, efficiently, and accurately find, access, and understand their own or others' research data.
RDM practices, if applied consistently and as early in a project as possible, can save you considerable time and effort later, when specific data are needed, when others need to make sense of your data, or when you decide to share or otherwise upload your data to a digital repository. Adopting RDM practices will also help you more easily comply with the data management plan (DMP) required for obtaining grants from many funding agencies and institutions.
Research data must be retained in sufficient detail and for an adequate period of time to enable appropriate responses to questions about accuracy, authenticity, primacy and compliance with laws and regulations governing the conduct of the research. External funding agencies will each have different requirements regarding storage, retention, and availability of research data. Please carefully review your award or agreement for the disposition of data requirements and data retention policies.
Developing a Data Management Plan
A good data management plan begins by understanding the sponsor requirements funding your research. As a principal investigator (PI) it is your responsibility to be knowledgeable of sponsors requirements. The Data Management Plan Tool (DMPTool) has been designed to help PIs adhere to sponsor requirements efficiently and effectively. It is strongly recommended that you take advantage of the DMPTool.
CWRU has an institutional account with DMPTool that enables users to access all of its resources via your Single Sign On credentials. CWRU's DMPTool account is supported by members of the Digital Scholarship team with the Freedman Center for Digital Scholarship. Please use the RDM Intake Request form to schedule a consultation if you would like support or guidance regarding developing a Data Management Plan.
Some basic steps to get started:
- Sign into the DMPTool site to start creating a DMP for managing and sharing your data.
- On the DMPTool site, you can find the most up to date templates for creating a DMP for a long list of funders, including the NIH, NEH, NSF, and more.
- Explore sample DMPs to see examples of successful plans.
Be sure that your DMP is addressing any and all federal and/or funder requirements and associated DMP templates that may apply to your project. It is strongly recommended that investigators submitting proposals to the NIH utilize this tool.
The NIH is mandating Data Management and Sharing Plans for all proposals submitted after Jan. 25, 2023. Guidance for completing a NIH Data Management Plan has its own dedicated content to provide investigators detailed guidance on development of these plans for inclusion in proposals.
A Data Management Plan can help create and maintain reliable data and promote project success. DMPs, when carefully constructed and reliably adhered to, help guide elements of your research and data organization.
A DMP can help you:
Document your process and data
- Maintain a file with information on researchers and collaborators and their roles, sponsors/funding sources, methods/techniques/protocols/standards used, instrumentation, software (w/versions), references used, any applicable restrictions on its distribution or use.
- Establish how you will document file changes, name changes, dates of changes, etc. Where will you record of these changes? Try to keep this sort of information in a plain text file located in the same folder as the files to which it pertains.
- How are derived data products created? A DMP encourages consistent description of data processing performed, software (including version number) used, and analyses applied to data.
- Establish regular forms or templates for data collection. This helps reduce gaps in your data, promotes consistency throughout the project.
Explain your data
- From the outset, consider why your data were collected, what the known and expected conditions may be for collection, and information such as time and place, resolution, and standards of data collected.
- What attributes, fields, or parameters will be studied and included in your data files? Identify and describe these in each file that employs them.
- How will your team and others understand your terminology and data descriptors? Create a data dictionary to explain the contents of your files, including variables and definitions of codes (e.g. "a value of 9999 means no data".
- For an overview of data dictionaries, see the USGS page here: https://www.usgs.gov/products/data-and-tools/data-management/data-dictionaries
Why are you being asked to include a data management plan (DMP) in your grant application? For grants awarded by US governmental agencies, two federal memos from the US Office of Science and Technology Policy (OSTP), issued in 2013 and 2015, respectively, have prompted this requirement. These memos mandate public access to federally- (and, thus, taxpayer-) funded research results, reflecting a commitment by the government to greater accountability and transparency. While "results" generally refers to the publications and reports produced from a research project, it is increasingly used to refer to the resulting data as well.
Federal research-funding agencies have responded to the OSTP memos by issuing their own guidelines and requirements for grant applicants (see below), specifying whether and how research data in particular are to be managed in order to be publicly and properly accessible.
Data Management Best Practices
As you plan to collect data for research, keep in mind the following best practices.
Keep Your Data Accessible to You
- Store your temporary working files somewhere easily accessible, like on a local hard drive or shared server.
- While cloud storage is a convenient solution for storage and sharing, there are often concerns about data privacy and preservation. Be sure to only put data in the cloud that you are comfortable with and that your funding and/or departmental requirements allow.
- For long-term storage, data should be put into preservation systems that are well-managed. [U]Tech provides several long-term data storage options for cloud and campus.
- Don't keep your original data on a thumb drive or portable hard drive, as it can be easily lost or stolen.
- Think about file formats that have a long life and that are readable by many programs. Formats like ascii, .txt, .csv, .pdf are great for long term preservation.
- A DMP is not a replacement for good data management practices, but it can set you on the right path if it is consistently followed. Consistently revisit your plan to ensure you are following it and adhering to funder requirements.
- Know the difference between storing and preserving your data. True preservation is the ongoing process of making sure your data are secure and accessible for future generations. Many sponsors have preferred or recommended data repositories. The DMP tool can help you identify these preferred repositories.
- Identify data with long-term value. Preserve the raw data and any intermediate/derived products that are expensive to reproduce or can be directly used for analysis. Preserve any scripted code that was used to clean and transform the data.
- Whenever converting your data from one format to another, keep a copy of the original file and format to avoid loss or corruption of your important files.
- Leverage online platforms like OSF can help your group organize, version, share, and preserve your data, if the sponsor hasn’t specified a specific platform.
- Adhere to federal sponsor requirements on utilizing accepted data repositories (NIH dbGaP, NIH SRA, NIH CRDC, etc.) for preservation.
Backup, Backup, Backup
- The general rule is to keep 3 copies of your data: 2 copies onsite, 1 offsite.
- Backup your data regularly and frequently - automate the process if possible. This may mean weekly duplication of your working files to a separate drive, syncing your folders to a cloud service like Box, or dedicating a block of time every week to ensure you've copied everything to another location.
- Establish a consistent, descriptive filing system that is intelligible to future researchers and does not rely on your own inside knowledge of your research.
- A descriptive directory and file-naming structure should guide users through the contents to help them find whatever they are looking for.
- Use consistent, descriptive filenames that reliably indicate the contents of the file.
- If your discipline requires or recommends particular naming conventions, use them!
- Some best practices for naming conventions include:
- Do not use spaces between words. Use either camelcase or underscores to separate words
- Include LastnameFirstname descriptors where appropriate.
- Use a consistent date format: YYYY-MM-DD, YYYY_MM_DD, or YYYYMMDD.
- Avoid using MM-DD-YYYY formats
- Do not append vague descriptors like "latest" or "final" to your file versions. Instead, append the version's date or a consistently iterated version number.
Clean Your Data
- Mistakes happen, and often researchers don't notice at first. If you are manually entering data, be sure to double-check the entries for consistency and duplication. Often having a fresh set of eyes will help to catch errors before they become problems.
- Tabular data can often be error checked by sorting the fields alphanumerically to catch simple typos, extra spaces, or otherwise extreme outliers. Be sure to save your data before sorting it to ensure you do not disrupt the records!
- Programs like OpenRefine are useful for checking for consistency in coding for records and variables, catching missing values, transforming data, and much more.
What should you do if you need assistance implementing RDM practices?
Whether it's because you need discipline-specific metadata standards for your data, help with securing sensitive data, or assistance writing a data management plan for a grant, help is available to you at CWRU. In addition to consulting the resources featured in this guide, you are encouraged to contact your department's liaison librarian.
If you are planning to submit a research proposal and need assistance with budgeting for data storage and or applications used to capture, manage, and or process data UTech provides information and assistance including resource boilerplates that list what centralized resources are available.
More specific guidance for including a budget for Data Management and Sharing is included on this document: Budgeting for Data Management and Sharing.
Custody of Research Data
The PI is the custodian of research data, unless agreed on in writing otherwise and the agreement is on file with the University, and is responsible for the collection, management, and retention of research data. The PI should adopt an orderly system of data organization and should communicate the chosen system to all members of a research group and to the appropriate administrative personnel, where applicable. Particularly for long-term research projects, the PI should establish and maintain procedures for the protection and management of essential records.
Many funding agencies require data to be shared for the purposes of reproducibility and other important scientific goals. It is important to plan for the timely release and sharing of final research data for use by other researchers. The final release of data should be included as a key deliverable of the DMP. Knowledge of the discipline-specific database, data repository, data enclave, or archive store used to disseminate the data should also be documented as needed.
The NIH is mandating Data Management and Sharing Plans for all proposals submitted after Jan. 25, 2023. Guidance for completing a NIH Data Management and Sharing Plan has its own dedicated content to provide investigators detailed guidance on development of these plans for inclusion in proposals.