NIH Data Sharing FAQ

Data Sharing

Data sharing achieves many important goals for the scientific community, such as

  • reinforcing open scientific inquiry
  • encouraging diversity of analysis and opinion
  • promoting new research, testing of new or alternative hypotheses, and methods of analysis
  • supporting studies on data collection methods and measurement
  • facilitating education of new researchers
  • enabling the exploration of topics not envisioned by the initial investigators
  • permitting the creation of new datasets by combining data from multiple sources.

Everyone benefits, including investigators, funding agencies, the scientific community, and, most importantly, the public. Data sharing provides more effective use of NIH resources by avoiding unnecessary duplication of data collection. It also conserves research funds to support more investigators. The initial investigator benefits, because as the data are used and published more broadly, the initial investigator's reputation grows.

National scientific organizations have made a commitment to the sharing and archiving of data through their ethical codes (e.g., the American Sociological Association) or publication policies (e.g., the American Psychological Association). Many years ago the National Academy of Sciences described the benefits of sharing data. (See books.nap.edu/catalog/2033/sharing-research-data) For years the National Science Foundation (NSF) Economics Program has required data underlying an article arising from an NSF grant to be placed in a public archive. Similar expectations exist at the National Institute of Justice. Moreover, many scientific journals require that authors make available the data included in their publications. In the biological sciences, protein and DNA sequences are made available to researchers through data archives, such as GenBank. Since 1996, NIH has required data sharing in several areas, such as DNA sequences, mapping information, and crystallographic coordinates.

Potentially all kinds of data are candidates for sharing, but unique data are especially important. Some biologic sciences already have data-sharing plans in place, such as genetic mapping. But other basic science data are also amenable to sharing. Data from human subjects (e.g., surveys, clinical studies) also can be shared if the identity and privacy of research participants can be protected.

Examples of shared epidemiologic data include the Framingham Heart Study, the Honolulu Heart Program, the Atherosclerosis Risk in Communities, Epidemiology of Chronic Disease in the Oldest Old, and the Iowa 65+ Rural Health Study. Examples of shared data from clinical trials include the Asymptomatic Cardiac Ischemia Pilot, the Intermittent Positive Pressure Breathing Study, and the Safety and Efficacy Trial of Zidovudine for Asymptomatic HIV Infected Individuals. Examples of shared datasets from the basic sciences include a growing number of genome sequences and maps, as well as protein and nucleotide databases (see ENTREZ ncbi.nlm.nih.gov/Database/index.html) and other resources for molecular biology at the National Center for Biotechnology Information at ncbi.nlm.nih.gov.

It depends. Participants' privacy must be protected in accord with all applicable laws and regulations. Clinical trial datasets are frequently rich in items that could potentially identify individual subjects. For example, many early phase trials use small samples, which make it difficult to protect the privacy of the participants. Researchers who are planning clinical trials and intend to share the resulting data should think carefully about the study design, the informed consent documents, and the structure of the resulting data prior to the initiation of the study.

There are many precedents for sharing of clinical trial data. For example, data from a number of clinical trials supported by the National Heart, Lung, and Blood Institute (NHLBI) are available for research use (See https://biolincc.nhlbi.nih.gov/home). The National Institute of Allergy and Infectious Diseases (NIAID) also lists their clinical trials datasets that they have made available through the National Technical Information Service (NTIS) for public use (See https://www.niaid.nih.gov/research/tools-datasets-and-services).

You should check the publication to see if reference is made to an archive, an enclave, or a Website where the data might be available. If no such information is provided, you may wish to send a letter to the PI to see if the data are available for sharing, and where you might be able to get the data and associated documentation.

Data Sharing Plans

Yes. By the October 1, 2003 application receipt date, NIH requests that all extramural applicants seeking $500,000 or more in direct costs in any one year provide a data-sharing plan in their applications.

Scientists submitting grant, cooperative, or contract applications should include a data-sharing plan, or provide justification for the absence of such a plan, in a brief paragraph to be placed immediately after the Research Plan Section (i.e., immediately after PHS 398 Section I. Letters of Support in the Research Plan Section of their application) so it does not count toward the application page limit. Additional information on data sharing might be included in other sections of the application, as appropriate. For example, if you are producing a large dataset that will become an important resource for the scientific community, you probably want to mention this in the significance section. If you are requesting funds to prepare, document, and archive the data, you would want to include relevant information in the budget and budget justification sections. In the Human Subjects section of the application, you should discuss the potential risks to research participants posed by data sharing and steps you will take to address those risks.

Yes. The specific nature of the data you will collect will determine whether or not you may share the final dataset. If the final data are not amenable to sharing, for example, if they are proprietary, then you need to explain this in your application. Under the Small Business Act, SBIR grantees may withhold their data for 4 years after the end of the award. The Small Business Act provides authority for NIH to protect from disclosure and nongovernmental use all SBIR data developed from work performed under an SBIR funding agreement for a period of 4 years after the closeout of either a Phase I or Phase II grant unless NIH obtains permission from the awardee to disclose these data. The data rights protection period lapses only upon expiration of the protection period applicable to the SBIR award, or by agreement between the small business concern and NIH.

No. Reviewers will not factor the proposed data-sharing plan into the determination of scientific merit or priority score. Program staff is responsible for overseeing the data-sharing policy and for assessing the appropriateness and adequacy of the proposed data-sharing plan. Program concerns must be resolved prior to making any award.

NIH recognizes that there may be circumstances where a cofunder has requested restrictions on data sharing as a condition of funding. These restrictions should be identified in the application and a proposal made about how data from the cofunded project will be shared. Should you believe that you are unable to share any of the data, your justification will be considered by NIH program staff.

As is the case with PIs who submit any additional or revised application material, your revised data-sharing plan must be signed by your institutional official and by you.

Yes, as long as such costs are reasonable and not excessive and reflect actual costs associated with complying with the request. These expenses for preparing and shipping the data might include costs of personnel, computing time, supplies, and other directly related expenses. NIH requirements for accountability for various types of income under NIH grants are specified elsewhere, see https://grants.nih.gov/grants/policy/nihgps/HTML5/section_8/8.3_management_systems_and_procedures.htm#Program

Data Sharing Rules

When the PI and the authorized institutional official sign the face page of an NIH application, they are assuring compliance with policies and regulations governing research awards. NIH expects grantees to follow these rules and to conduct the work described in the application. Thus, if an application describes a data sharing plan, NIH expects that plan to be enacted. In some instances, for example, NIH may make data sharing a term and condition of award. Under specific circumstances, your data also may be accessible through the Freedom of Information Act (FOIA). If your competitive grant was awarded after April 17, 2000 and if your data were cited in a Federal regulation or administrative order, then your data may also be accessible through FOIA. (See https://grants.nih.gov/grants/policy/nihgps/HTML5/section_8/8.3_management_systems_and_procedures.htm#Program).

No. Data-sharing plans should encompass all data from funded research that can be shared without compromising individual subjects' rights and privacy, regardless of whether the data have been used in a publication. Furthermore, data sharing prior to the publication of major results is encouraged in many instances, for example, when data are collected to provide a resource for the scientific community (as in the case of many large surveys).

Recognizing that the value of data often depends on their timeliness, data sharing should occur in a timely fashion. NIH expects the timely release and sharing of data to be no later than the acceptance for publication of the main findings from the final dataset. This time point will be influenced by the nature of the data collected. Data from small studies can be analyzed and submitted for publication relatively quickly. If data from large epidemiologic or longitudinal studies are collected over several discrete time periods or waves, data should be released in waves as data become available or main findings from waves of the data are published. NIH recognizes that the investigators who collected the data have a legitimate interest in benefiting from their investment of time and effort. NIH continues to expect that the initial investigators may benefit from the first and continuing use, but not from prolonged exclusive use. While NIH also understands that an institution's desire to exercise its intellectual property rights may justify a need to delay disclosure of research findings, a delay of 30 to 60 days is generally viewed as a reasonable period for such activity.

In addition to publishing small datasets, there are several alternatives to responding to each separate request to share data (e.g., putting data in an archive or restricted access facility, and setting up a web site for data access). Archives and data enclaves provide technical assistance for users with questions or problems and may spare busy investigators time.

Yes. Your data-sharing plans should indicate the criteria for deciding who can receive your data and whether or not you will place any conditions on their use. Data should be made as widely and freely available as possible while safeguarding the confidentiality of the data and privacy of participants. You should not place limits on the questions or methods others might pursue nor should you require co-authorship as a condition for receiving the data.

No, but if you plan to collect additional data from those subjects under a grant with a data-sharing plan, you should revise the consent procedure to be consistent with the data-sharing plan. In preparing and submitting a data-sharing plan during the application process, investigators should avoid developing or relying on consent processes that promise research participants not to share data with other researchers. Such promises should not be made routinely or without adequate justification described in the data-sharing plan

If any NIH support (i.e., partial support) is provided for resource development, even if those research resources were developed primarily with non-NIH funds, then those research resources must be shared in line with NIH policy as if NIH funded the entire project. It should be emphasized that although a data sharing plan is only required of grants awarding direct costs of $500,000 or more in any one year, data sharing itself (without a specific plan submission) continues to be a requirement of all NIH-funded grants. If the P30 maintains core resources that actually house and are the final repository of the data, e.g., a high throughput array analysis core, then any project using the center’s resources would be subject to the center’s data sharing plan.

Data Explained

By "final research data," we mean recorded factual material commonly accepted in the scientific community as necessary to validate research findings. Final research data do not include laboratory notebooks, partial datasets, preliminary analyses, drafts of scientific papers, plans for future research, peer review reports, communications with colleagues, or physical objects, such as gels or laboratory specimens.

Sometimes. For example, where NIH support is sought to transform or link datasets (as opposed to producing new data), the investigator should include a data-sharing plan in the application.

By "unique data" we mean data that cannot be readily replicated. Examples of studies producing unique data include: large surveys that are too expensive to replicate; studies of unique populations, such as centenarians; studies conducted at unique times, such as a natural disaster; studies of rare phenomena, such as rare metabolic diseases.

It is appropriate to acknowledge the source of data upon which a manuscript is based. Many investigators include this information in the methods and/or reference sections of their manuscripts. Journals generally include an acknowledgement section, in which the authors can recognize people who helped them gain access to the data. However, you should check the policies of the journal to which you plan to submit.

Privacy

It is the responsibility of the investigators, their IRB, and their institution to protect the rights of participants and the confidentiality of their data. Data should be redacted to strip all individual identifiers, and effective strategies should be adopted to minimize risk of disclosing a participant's identity. Options to protect privacy include: withholding part of the data, statistically altering the data in ways that will not compromise secondary analyses, requiring researchers who seek data to commit to protect privacy and confidentiality, and providing data access in a controlled site, sometimes referred to as a data enclave. Some investigators use hybrid methods, releasing a redacted dataset for general use but providing access to more sensitive data through a user contract or data enclave. In most instances, sharing data is possible without compromising participant confidentiality and privacy.

Yes. NIH recognizes that data sharing may be complicated or limited, in some cases, by institutional policies or local IRB rules, as well as by local, state and Federal laws and regulations like the Privacy Rule. To protect the rights and privacy of people who participate in NIH-sponsored research, data intended for broader use should be free of identifiers that would permit linkages to individual research participants, and exclude variables that could lead to deductive disclosure of the identity of individual subjects. When data sharing is limited, applicants should explain such limitations in their data sharing plans.

Not necessarily. The collection of sensitive data does not preclude sharing. For example, the National Center for Chronic Disease Prevention and Health Promotion at CDC operates the Youth Risk Behavior Surveillance System (YRBSS), available at https://www.cdc.gov/healthyyouth/data/yrbs/, which provides data on six health risk behaviors among youth: unintentional injuries and violence, tobacco use, alcohol and other drug use, sexual behaviors, dietary behaviors, and physical activity. Similarly, data from the National Survey of Family Growth, which includes statistical data on family life, marriage and divorce, contraception, sexual experience, pregnancy, and infertility, can be obtained from the National Center for Health Statistics. Sensitive data can be shared so long as appropriate privacy safeguards are in place. Investigators must determine if and how the rights and privacy of the subjects can be protected. And investigators collecting data on sensitive and illegal behaviors should obtain a Certificate of Confidentiality (https://grants.nih.gov/policy/humansubjects/coc.htm) to protect against the involuntary release of data that could identify research participants.

Data Archiving

Maybe. Archives are organizations that collect and distribute data. They understand what is needed to prepare data for wider distribution and documentation for users. They provide stable, reliable, and cost-effective means for distributing data. They also provide protections for the dataset and technical assistance for requestors.

Guidance is available from a variety of sources. For example, the Inter-University Consortium for Political and Social Research at the University of Michigan has prepared an excellent set of guidelines for preparing data for archiving. While these guidelines were written with social science data in mind, they are broadly applicable. See http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf For molecular biology information, the National Center for Biotechnology Information (NCBI), a division of the National Library of Medicine (NLM) at the National Institutes of Health, is ready to assist researchers who have genome-specific and molecular data to submit. For more information about submitting and accessing NCBI data, see the NCBI Website at http://www.ncbi.nlm.nih.gov.

NIH recognizes that it takes time and money to prepare data for sharing. You can request funds for data archiving and sharing as part of your grant application for collecting the data. If you have already collected the data, you may want to ask your NIH Project Officer about a competitive or administrative supplement. NIH recommends that you consider procedures and costs for data sharing during the application process rather than after the data have been collected.

Investigators need to find a balance between the value of the final data and the costs associated with archiving. If the data are of limited usefulness, then it is probably not worth the expense and effort of putting them in an archive. However, if the investigator has published results based on this dataset, then the dataset should be shared.

Yes. For example, GenBank (http://www.ncbi.nih.gov/genbank/) and Entrez (http://www.ncbi.nih.gov/Entrez/) archive gene sequencing data. The sharing of materials, data, and software in a timely manner has been an essential element in the rapid progress that has been made in the genetic analysis of mammalian genomes.