By Erman Ayday
Digital health data (including electronic health records, omics data, medical images and data collected from wearable devices) has the potential to unlock the deepest secrets of life and improve our understanding of common as well as rare inherited diseases, reproductive health and cancer. As the amount of available healthcare data continues to grow rapidly, it is becoming increasingly important to develop tools and techniques that can leverage this data to its fullest potential. By doing so, we can gain a better understanding of the basis of diseases, and ultimately improve our ability to diagnose and treat them.
To better utilize the digital health data and pave the way towards personalized and precision medicine, it is crucial to (i) collect various types of data from individuals and (ii) facilitate data sharing and collaboration between data collectors.
Physicians and researchers ideally want continuous monitoring and collection of daily information about their patients. The most recent example of this was during COVID-19, when locations patterns, contact histories, demographics and phenotypes of people were very relevant to the spread of the virus. On the other hand, to facilitate medical research, researchers need to share data to conduct collaborative analyses on healthcare datasets. Different from large studies, which are primarily led by large research consortiums, study of relatively small and specific subpopulations (e.g., rare disease analysis) would significantly benefit from collaborations between researchers. Therefore, tools to facilitate such collaborations and data sharing would accelerate research through team science and democratize healthcare data sharing. This allows for more accurate and efficient results, as well as an increased understanding of the data.
However, privacy concerns are a significant barrier to such data collection and data sharing initiatives. The sensitive nature of healthcare data and personal information raise concerns about the potential for misuse and abuse if it is not properly protected. These concerns have led to a number of restrictions on how such data can be collected and shared, making it difficult for researchers to access the data they need to conduct their work. As a result of such restrictions, currently digital health data is still locked in silos and for many crucial studies (e.g., studying rare diseases), the amount of data collected at any single site is insufficient. In addition, it is hard even for the physicians to collect continuous behavioral data about their patients.
Wide-scale collection and sharing of digital health data is only possible via privacy-preserving and ethical data collection from individuals and data sharing between entities. An ideal system should give the researchers the ability to process and compute capabilities over digital health data by simultaneously providing privacy for the individuals. Here, the term privacy means data owners (individuals) having full control over their personal information, including control over which party will have access to what part of their data, how long a party will have access to their data and revocation of data access rights. Here at xLab, we have been working on making such systems a reality. We believe that next generation of digital health data collection and management will only be possible via such systems that are built privacy-by-default and privacy-by-design.