Statistical Analysis Request Requirements
These guidelines address the case in which a CTSC investigator from the Cleveland Clinic, Case/UH, or Metro has a data set related to multi-site research, requires statistical analyses, and is requesting CTSC Voucher funding to support the efforts of biostatistician(s) who will be doing QC of the data set and performing statistical analyses. At the Cleveland Clinic, these personnel will come from the Alpha Beta Team’s chargeback group; at Case Western Reserve University, these biostatisticians will come from the PQHS Biostats Core; and at the MetroHealth System, these biostatisticians will come from the Population Health and Equity Research Institute’s Biostatistics and Data Science Core.
As part of the Voucher team evaluation request, the Voucher team will ensure that it meets the CTSC voucher program’s multi-institutional requirements and evaluate a list of the contents of the data set to ensure that either a) a legally valid Data Use Agreement that applies to this specific relevant-to-multi-institutional-research data is in already in place or b) a person with appropriate regulatory knowledge has provided brief text noting why a Data Use Agreement is not be required. The data set provided to the statistician will be a limited data set under HIPAA, i.e., will not include direct identifiers such as patient name, address, phone number, medical record number, or device serial number. If the person with appropriate regulatory knowledge determines that a Data Use Agreement will be required, the voucher will not be awarded until the DUA has been executed by the participating institutions.
The BERD Lead at the investigator's institution will work with the statistical analysis group at their institution to identify the person(s) who is best suited to provide an estimate for the number of hours the QC of the data set and subsequent statistical analysis will require and will serve as a liaison between the investigator, the data analysis team at their institution, and the CTSC. We anticipate that in most cases, the estimate of the number of hours required will be provided by the person who will be analyzing the data.
If the investigator has a biostatistician specifically designated and funded by their institution or department to perform the statistical analyses investigators in their department need, that designated biostatistician will be asked to provide a brief text explaining why this cannot be accomplished by the funded statistician in place and - where possible - provide insight on the data set or data analysis request that would facilitate the
CTSC voucher time estimate. Investigators whose departments include biostatistician(s) to meet the statistical analysis needs of their department will be encouraged to use their designated, funded biostatistician so statistical analysis vouchers can be used by investigators who do not have access to a biostatistician. At the Cleveland Clinic, we anticipate that voucher funding is most likely to be awarded to investigators in departments that do not have a funded biostatistician. This could encourage research in these departments.
The investigator will provide a clear description of the data set they are providing and the results they are requesting, including the hypotheses they are investigating or questions they are trying to answer precisely. The CCF, Case, and Metro statisticians who will be carrying out these requests will analyze quantitative clinical data sets using standard statistical analysis programs that are part of SAS or R. At the time of the quote, it is possible to let the investigator know whether the statistician will be using SAS or R. We cannot provide qualitative or highly specialized analyses (e.g., we cannot offer advanced longitudinal or structural modelling or analyze genomics/ proteomics/
metabolomics data sets.) The quote/estimate provided will include the number of hours the analysis expected to require and the number of weeks expected for the request to be completed. These numbers should be considered estimates. The person who provides the quote/estimate for the data analysis is anticipated to be the person who will most likely be cleaning/quality control checking and analyzing the data. The investigator who requests the data will be told that the biostatistician may find unanticipated issues with the data the investigator collected that will cause these data to take longer to analyze.
The statisticians can accept data as clean CSV files (comma delimited files) or clean excel spreadsheets, i.e., files where there are values (or missing data) for the variables, but not files including commentary, and the statisticians can accept data that has been exported from a database management package (by the person who manages that database) into a SAS or R data set. The data set the investigator provides to the statistician must be documented in such a way that the statistician can determine what each variable is, including, for example, the names of each variable, units of measure, and the meanings of codes used for any coded variable. The statistician will let the investigator know whether they plan to use SAS or R to analyze the data.
This mechanism should not be used for urgent requests. The biostatistician funded by the voucher will endeavor to perform the analysis quickly and efficiently. An estimated timeline for completion of the data set QC and statistical analysis work will be provided by the biostatistician when the hours estimate is provided, but this timeline could be impacted by the team's workload and personnel availability at the time when the voucher is funded.