Data Collection

Data collection is the process of gathering and measuring information used for research. Collecting data is one of the most important steps in the research process, and is part of all disciplines including physical and social sciences, humanities, business, etc. Data comes in many forms with different ways to store and record data, either written in a lab notebook and or recorded digitally on a computer system. 

While methods may differ across disciplines,  good data management processes begin with accurately and clearly describing the information recorded, the process used to collect the data, practices that ensure the quality of the data, and sharing data to enable reproducibility. This section breaks down different topics that need to be addressed while collecting and managing data for research.

Learn more about what’s required for data collection as a researcher at Case Western Reserve University. 

Ensuring Accurate and Appropriate Data Collection

Accurate data collection is vital to ensure the integrity of research. It is important when planning and executing a research project to consider methods collection and the storage of data to ensure that results can be used for publications and reporting.   The consequences from improper data collection include:

  • inability to answer research questions accurately
  • inability to repeat and validate the study
  • distorted findings resulting in wasted resources
  • misleading other researchers to pursue fruitless avenues of investigation
  • compromising decisions for public policy
  • causing harm to human participants and animal subjects

While the degree of impact from inaccurate data may vary by discipline, there is a potential to cause disproportionate harm when data is misrepresented and misused. This includes fraud or scientific misconduct.

Any data collected in the course of your research should follow RDM best practices to ensure accurate and appropriate data collection. This includes as appropriate, developing data collection protocols and processes to ensure inconsistencies and other errors are caught and corrected in a timely manner.

Examples of Research Data

Research data is any information that has been collected, observed, generated or created in association with research processes and findings.

Much research data is digital in format, but research data can also be extended to include non-digital formats such as laboratory notebook, diaries, or written responses to surveys. Examples may include (but are not limited to):

  • Excel spreadsheets that contains instrument data
  • Documents (text, Word), containing study results
  • Laboratory notebooks, field notebooks, diaries
  • Questionnaires, transcripts, codebooks
  • Audiotapes, videotapes
  • Photographs, films
  • Protein or genetic sequences
  • Spectra
  • Test responses
  • Slides, artifacts, specimens, samples
  • Collection of digital objects acquired and generated during the process of research
  • Database contents (video, audio, text, images)
  • Models, algorithms, scripts
  • Contents of an application (input, output, logfiles for analysis software, simulation software, schemas)
  • Source code used in application development

To ensure reproducibility of experiments and results, be sure to include and document information such as: 

  • Methodologies and workflows
  • Standard operating procedures and protocols

Data Use Agreements 

When working with data it is important to understand any restrictions that need to be addressed due to the sensitivity of the data. This includes how you download and share with other collaborators, and how it needs to be properly secured. 

Datasets can include potentially sensitive data that needs to be protected, and not openly shared. In this case, the dataset cannot be shared and or downloaded without permission from CWRU Research Administration and may require an agreement between collaborators and their institutions. All parties will need to abide by the agreement terms including the destruction of data once the collaboration is complete.

Storage Options 

UTech provides cloud and on-premise storage to support the university research mission. This includes Google Drive, Box, Microsoft 365, and various on-premise solutions for high speed access and mass storage. A listing of supported options can be found on UTech’s website.

In addition to UTech-supported storage solutions, CWRU also maintains an institutional subscription to OSF (Open Science Framework). OSF is a cloud-based data storage, sharing, and project collaboration platform that connects to many other cloud services like Drive, Box, and Github to amplify your research and data visibility and discoverability. OSF storage is functionally unlimited.

When selecting a storage platform it is important to understand how you plan to analyze and store your data. Cloud storage provides the ability to store and share data effortlessly and provides capabilities such as revisioning and other means to protect your data. On-premise storage is useful when you have large storage demands and require a high speed connection to instruments that generate data and systems that process data. Both types of storage have their advantages and disadvantages that you should consider when planning your research project.

Data Security

Data security is a set of processes and ongoing practices designed to protect information and the systems used to store and process data. This includes computer systems, files, databases, applications, user accounts, networks, and services on institutional premises, in the cloud, and remotely at the location of individual researchers. 

Effective data security takes into account the confidentiality, integrity, and availability of the information and its use. This is especially important when data contains personally identifiable information, intellectual property, trade secrets, and or technical data supporting technology transfer agreements (before public disclosure decisions have been made).

Data Categorization 

CWRU uses a 3-tier system to categorize research data based on information types and sensitivity. Determination is based upon risk to the University in the areas of confidentiality, integrity, and availability of data in support of the University's research mission. In this context, confidentiality measures to what extent information can be disclosed to others, integrity is the assurance that the information is trustworthy and accurate, and availability is a guarantee of reliable access to the information by authorized users.

Information (or data) owners are responsible for determining the impact levels of their information, i.e. what happens if the data is improperly accessed or lost accidentally, implementing the necessary security controls, and managing the risk of negative events including data loss and unauthorized access.

Classification

Examples

Public

  • general communications
  • research material used in promoting participation when approved for disclosure
  • publications when approved for disclosure

Internal Use

  • research plans
  • internal only communications
  • home phone numbers
  • data generated from instruments 
  • laboratory notebooks
  • software code
  • data analysis results   

Restricted

  • personally identified information
  • export controlled data 
  • electronic personal health information (ePHI)
  • social security numbers associated with a person's name
  • birth date associated with a person's name
  • intellectual property, trade secrets, technical data supporting technology transfer agreements (before public disclosure decisions have been made)

Loss, corruption, or inappropriate access to information can interfere with CWRU's mission, interrupt business and damage reputations or finances. 

Securing Data

The classification of data requires certain safeguards or countermeasures, known as controls, to be applied to systems that store data. This can include restricting access to the data, detecting unauthorized access, preventative measures to avoid loss of data, encrypting the transfer and storage of data, keeping the system and data in a secure location, and receiving training on best practices for handling data. Controls are classified according to their characteristics, for example:

  • Physical controls e.g. doors, locks, climate control, and fire extinguishers;
  • Procedural or administrative controls e.g. policies, incident response processes, management oversight, security awareness and training;
  • Technical or logical controls e.g. user authentication (login) and logical access controls, antivirus software, firewalls;
  • Legal and regulatory or compliance controls e.g. privacy laws, policies and clauses.

Principal Investigator (PI) Responsibilities

The CWRU Faculty Handbook provides guidelines for PIs regarding the custody of research data. This includes, where applicable, appropriate measures to protect confidential information. It is everyone’s responsibility to ensure that our research data is kept securely and available for reproducibility and future research opportunities.

University Technology provides many services and resources related to data security including assistance with planning and securing data. This includes processing and storing restricted information used in research. 

Data Collected as Part of Human Subject Research 

To ensure the privacy and safety of the individual participating in a human subject research study, additional rules and processes are in place that describe how one can use and disclose data collected,  The Office of Research Administration provides information relevant to conducting this type of research. This includes:

  • Guidance on data use agreements and processes for agreements that involve human-related data or human-derived samples coming in or going out of CWRU.
  • Compliance with human subject research rules and regulations.

According to 45 CFR 46, a human subject is "a living individual about whom an investigator (whether professional or student) conducting research:

  • Obtains information or biospecimens through intervention or interaction with the individual, and uses, studies, or analyzes the information or biospecimens; or
  • Obtains, uses, studies, analyzes, or generates identifiable private information or identifiable biospecimens."

The CWRU Institutional Review Board reviews social science/behavioral studies, and low-risk biomedical research not conducted in a hospital setting for all faculty, staff, and students of the University. This includes data collected and used for human subjects research. 

Research conducted in a hospital setting including University Hospitals requires IRB protocol approval.

Questions regarding the management of human subject research data should be addressed to the CWRU Institutional Review Board.

Getting Help With Data Collection

If you are looking for datasets and other resources for your research you can contact your subject area librarian for assistance.

If you need assistance with administrative items such as data use agreements or finding the appropriate storage solution please contact the following offices.

Guidance and Resources