Developing a Data Management Plan

This section breaks down different topics required for the planning and preparation of data used in research at Case Western Reserve University. In this phase you should understand the research being conducted, the type and methods used for collecting data, the methods used to prepare and analyze the data, addressing budgets and resources required, and have a sound understanding of how you will manage data activities during your research project.

Many federal sponsors of Case Western Reserve funded research have required data sharing plans in research proposals since 2003. As of Jan. 25, 2023, the National Institutes of Health has revised its data management and sharing requirements. 

This website is designed to provide basic information and best practices to seasoned and new investigators as well as detailed guidance for adhering to the revised NIH policy.  

Basics of Research Data Management

Research data management (RDM) comprises a set of best practices that include file organization, documentation, storage, backup, security, preservation, and sharing, which affords researchers the ability to more quickly, efficiently, and accurately find, access, and understand their own or others' research data.

RDM practices, if applied consistently and as early in a project as possible, can save you considerable time and effort later, when specific data are needed, when others need to make sense of your data, or when you decide to share or otherwise upload your data to a digital repository. Adopting RDM practices will also help you more easily comply with the data management plan (DMP) required for obtaining grants from many funding agencies and institutions.

Research data must be retained in sufficient detail and for an adequate period of time to enable appropriate responses to questions about accuracy, authenticity, primacy and compliance with laws and regulations governing the conduct of the research. External funding agencies will each have different requirements regarding storage, retention, and availability of research data. Please carefully review your award or agreement for the disposition of data requirements and data retention policies.

Developing a Data Management Plan

A good data management plan begins by understanding the sponsor requirements funding your research. As a principal investigator (PI) it is your responsibility to be knowledgeable of sponsors requirements. The Data Management Plan Tool (DMPTool) has been designed to help PIs adhere to sponsor requirements efficiently and effectively. It is strongly recommended that you take advantage of the DMPTool.  

CWRU has an institutional account with DMPTool that enables users to access all of its resources via your Single Sign On credentials. CWRU's DMPTool account is supported by members of the Digital Scholarship team with the Freedman Center for Digital Scholarship. Please use the RDM Intake Request form to schedule a consultation if you would like support or guidance regarding developing a Data Management Plan.

Some basic steps to get started:

Be sure that your DMP is addressing any and all federal and/or funder requirements and associated DMP templates that may apply to your project. It is strongly recommended that investigators submitting proposals to the NIH utilize this tool. 

The NIH is mandating Data Management and Sharing Plans for all proposals submitted after Jan. 25, 2023.  Guidance for completing a NIH Data Management Plan has its own dedicated content to provide investigators detailed guidance on development of these plans for inclusion in proposals. 

A Data Management Plan can help create and maintain reliable data and promote project success. DMPs, when carefully constructed and reliably adhered to, help guide elements of your research and data organization.

A DMP can help you:

Document your process and data

  • Maintain a file with information on researchers and collaborators and their roles, sponsors/funding sources, methods/techniques/protocols/standards used, instrumentation, software (w/versions), references used, any applicable restrictions on its distribution or use.
  • Establish how you will document file changes, name changes, dates of changes, etc. Where will you record of these changes? Try to keep this sort of information in a plain text file located in the same folder as the files to which it pertains.
  • How are derived data products created? A DMP encourages consistent description of data processing performed, software (including version number) used, and analyses applied to data.
  • Establish regular forms or templates for data collection. This helps reduce gaps in your data, promotes consistency throughout the project.

Explain your data

  • From the outset, consider why your data were collected, what the known and expected conditions may be for collection, and information such as time and place, resolution, and standards of data collected.
  • What attributes, fields, or parameters will be studied and included in your data files? Identify and describe these in each file that employs them.
  • How will your team and others understand your terminology and data descriptors? Create a data dictionary to explain the contents of your files, including variables and definitions of codes (e.g. "a value of 9999 means no data".

DMP Requirements

Why are you being asked to include a data management plan (DMP) in your grant application? For grants awarded by US governmental agencies, two federal memos from the US Office of Science and Technology Policy (OSTP), issued in 2013 and 2015, respectively, have prompted this requirement. These memos mandate public access to federally- (and, thus, taxpayer-) funded research results, reflecting a commitment by the government to greater accountability and transparency. While "results" generally refers to the publications and reports produced from a research project, it is increasingly used to refer to the resulting data as well.

Federal research-funding agencies have responded to the OSTP memos by issuing their own guidelines and requirements for grant applicants (see below), specifying whether and how research data in particular are to be managed in order to be publicly and properly accessible.

  • NSF—National Science Foundation
    "Proposals submitted or due on or after January 18, 2011, must include a supplementary document of no more than two pages labeled 'Data Management Plan'. This supplementary document should describe how the proposal will conform to NSF policy on the dissemination and sharing of research results." Note: Additional requirements may apply per Directorate, Office, Division, Program, or other NSF unit.
  • NIH—National Institutes of Health
    "To facilitate data sharing, investigators submitting a research application requesting $500,000 or more of direct costs in any single year to NIH on or after October 1, 2003 are expected to include a plan for sharing final research data for research purposes, or state why data sharing is not possible."

 

  • NASA—National Aeronautics and Space Administration
    "The purpose of a Data Management Plan (DMP) is to address the management of data from Earth science missions, from the time of their data collection/observation, to their entry into permanent archives."
  • DOD—Department of Defense
    "A Data Management Plan (DMP) describing the scientific data expected to be created or gathered in the course of a research project must be submitted to DTIC at the start of each research effort. It is important that DoD researchers document plans for preserving data at the outset, keeping in mind the potential utility of the data for future research or to support transition to operational or other environments. Otherwise, the data is lost as researchers move on to other efforts. The essential descriptive elements of the DMP are listed in section 3 of DoDI 3200.12, although the format of the plan may be adjusted to conform to standards established by the relevant scientific discipline or one that meets the requirements of the responsible Component"
  • Department of Education
    "The purpose of this document is to describe the implementation of this policy on public access to data and to provide guidance to applicants for preparing the Data Management Plan (DMP) that must outline data sharing and be submitted with the grant application. The DMP should describe a plan to provide discoverable and citable dataset(s) with sufficient documentation to support responsible use by other researchers, and should address four interrelated concerns—access, permissions, documentation, and resources—which must be considered in the earliest stages of planning for the grant."
  • "Office of Scientific and Technical Information (OSTI)
    Provides access to free, publicly-available research sponsored by the Department of Energy (DOE), including technical reports, bibliographic citations, journal articles, conference papers, books, multimedia, software, and data.

Data Management Best Practices

As you plan to collect data for research, keep in mind the following best practices. 

Keep Your Data Accessible to You

  • Store your temporary working files somewhere easily accessible, like on a local hard drive or shared server.
  • While cloud storage is a convenient solution for storage and sharing, there are often concerns about data privacy and preservation. Be sure to only put data in the cloud that you are comfortable with and that your funding and/or departmental requirements allow.
  • For long-term storage, data should be put into preservation systems that are well-managed. [U]Tech provides several long-term data storage options for cloud and campus. 
  • Don't keep your original data on a thumb drive or portable hard drive, as it can be easily lost or stolen.
  • Think about file formats that have a long life and that are readable by many programs. Formats like ascii, .txt, .csv, .pdf are great for long term  preservation.
  • A DMP is not a replacement for good data management practices, but it can set you on the right path if it is consistently followed. Consistently revisit your plan to ensure you are following it and adhering to funder requirements.

Preservation

  • Know the difference between storing and preserving your data. True preservation is the ongoing process of making sure your data are secure and accessible for future generations. Many sponsors have preferred or recommended data repositories. The DMP tool can help you identify these preferred repositories. 
  • Identify data with long-term value. Preserve the raw data and any intermediate/derived products that are expensive to reproduce or can be directly used for analysis. Preserve any scripted code that was used to clean and transform the data.
  • Whenever converting your data from one format to another, keep a copy of the original file and format to avoid loss or corruption of your important files.
  • Leverage online platforms like OSF can help your group organize, version, share, and preserve your data, if the sponsor hasn’t specified a specific platform.
  • Adhere to federal sponsor requirements on utilizing accepted data repositories (NIH dbGaP, NIH SRA, NIH CRDC, etc.) for preservation. 

Backup, Backup, Backup

  • The general rule is to keep 3 copies of your data: 2 copies onsite, 1 offsite.
  • Backup your data regularly and frequently - automate the process if possible. This may mean weekly duplication of your working files to a separate drive, syncing your folders to a cloud service like Box, or dedicating a block of time every week to ensure you've copied everything to another location.

Organization

  • Establish a consistent, descriptive filing system that is intelligible to future researchers and does not rely on your own inside knowledge of your research.
  • A descriptive directory and file-naming structure should guide users through the contents to help them find whatever they are looking for.

Naming Conventions

  • Use consistent, descriptive filenames that reliably indicate the contents of the file.
  • If your discipline requires or recommends particular naming conventions, use them!
  • Some best practices for naming conventions include:
    • Do not use spaces between words. Use either camelcase or underscores to separate words
    • Include LastnameFirstname descriptors where appropriate.
    • Use a consistent date format: YYYY-MM-DD, YYYY_MM_DD, or YYYYMMDD.
      • Avoid using MM-DD-YYYY formats
  • Do not append vague descriptors like "latest" or "final" to your file versions. Instead, append the version's date or a consistently iterated version number.

Clean Your Data

  • Mistakes happen, and often researchers don't notice at first. If you are manually entering data, be sure to double-check the entries for consistency and duplication. Often having a fresh set of eyes will help to catch errors before they become problems.
  • Tabular data can often be error checked by sorting the fields alphanumerically to catch simple typos, extra spaces, or otherwise extreme outliers. Be sure to save your data before sorting it to ensure you do not disrupt the records!
  • Programs like OpenRefine are useful for checking for consistency in coding for records and variables, catching missing values, transforming data, and much more.

What should you do if you need assistance implementing RDM practices?

Whether it's because you need discipline-specific metadata standards for your data, help with securing sensitive data, or assistance writing a data management plan for a grant, help is available to you at CWRU. In addition to consulting the resources featured in this guide, you are encouraged to contact your department's liaison librarian.

Budgeting

If you are planning to submit a research proposal and need assistance with budgeting for data storage and or applications used to capture, manage, and or process data UTech provides information and assistance including resource boilerplates that list what centralized resources are available. 

More specific guidance for including a budget for Data Management and Sharing is included on this document: Budgeting for Data Management and Sharing

Custody of Research Data

The PI is the custodian of research data, unless agreed on in writing otherwise and the agreement is on file with the University, and is responsible for the collection, management, and retention of research data. The PI should adopt an orderly system of data organization and should communicate the chosen system to all members of a research group and to the appropriate administrative personnel, where applicable. Particularly for long-term research projects, the PI should establish and maintain procedures for the protection and management of essential records.

Resources:

CWRU Custody of Research Data Policy 

Data Sharing

Many funding agencies require data to be shared for the purposes of reproducibility and other important scientific goals. It is important to plan for the timely release and sharing of final research data for use by other researchers.  The final release of data should be included as a key deliverable of the DMP. Knowledge of the discipline-specific database, data repository, data enclave, or archive store used to disseminate the data should also be documented as needed. 

The NIH is mandating Data Management and Sharing Plans for all proposals submitted after Jan. 25, 2023. Guidance for completing a NIH Data Management and Sharing Plan  has its own dedicated content to provide investigators detailed guidance on development of these plans for inclusion in proposals.