I've initially been trained in Computer Science then in Molecular Biology and Bioinformatics and lately in Statistics. This mixed background allowed me to engage in research and teaching activities at the interface of these three areas. Over the last 15 years, and especially since my starting date as a faculty at CWRU, I've been the witness of tremendous changes in these disciplines, not only on their own, but also how they intersect and complement each other. Now more than ever is an exciting time to work in this evolving field of Data Science to constantly create new and powerful ways to study and model complex systems and phenomenon such as in genomics, proteomics, metabolomics and biomedical science.
My research interest is focused on real-world research problem in computational and statistical biology with emphasis on developing data mining methods in high-dimensional data, mostly from high-throughput technologies. Because of my early background in computer science, I've always devoted time to develop computational resources such as webtools and softwares. Recently, I have been successful in securing NIH funding to support a long-term research project in so-called “Survival Bump Hunting for Identifying and Characterizing Informative Diagnostics and Prognostics Subgroups of Patients in High Dimensional Data"with direct clinical implications in precision medicine as well as early detection and intervention. This has lead so far to the publication of a few articles, book chapter, websites and softwares (see references in sections below and following links).
Conventional statistical models are inappropriate when dealing with large datasets where the number of variables exceeds the number of observations (so-called p >> n paradigm). It is a challenging problem causing severe risks of model unfitting and statistical errors. Particular issues posed by high dimensional data are the control of error rates due to inherent noise of the employed technologies, the multi-collinearity of predictors due to the parallel nature variables interrogation, and the sparsity of informative predictors due to the massive number of variables interrogated compared to the fewness of variables at play.
My research focus is in Computational/Statistical Biology, with emphasis on developing data mining methods in high dimensional data (p >> n paradigm) as generated by metabolomics, proteomics and sequencing high-throughput technologies.
Recent focus has been in:
- Bump Hunting in Classification, Regression and Survival settings for High-Dimensional Data: General applications are in identifying, predicting and characterizing informative subgroups. One currently under development is in “Survival Bump Hunting” where the outcome of interest is a time-to-event, possibly censored. An application is in identifying and characterizing informative prognostics subgroups of patients for risk and reliability analysis. Direct clinical implications are in improved diagnostic and prognostic tools for personalized medicine as well as early diagnosis, prevention and intervention.
- Model Selection and Predictive Modeling (Bayesian and Frequentist) applied to Differential Expression, Association and Interaction Problems: Recent discoveries were made in genetic association, biomarker discovery, and proteomics interaction studies.
- Data Integration of High-Dimensional Data using either Statistical or Network-Based Models.
- Regularization and Variance Stabilization of High-Dimensional Data: Statistical Computing and Software Development; Resampling and Monte-Carlo methods; Parallel Computing and Computational Complexity; Source Code Management and Collaborative Software Development (GitHub).
Awards and Honors
- 2013-2018: NIH NCI Research Grant Principal Investigator (R01 CA 160593)
- 2003-2006: NIH NCI CoGEC Training Grant Fellow (R25 CA 094186)
Publications In Preparation or Submitted
- DAZARD J-E., CHOE M., PAWITAN Y., RAO J.S. Identification and Characterization of Informative Prognostic Subgroups by Survival Bump Hunting. (in prep 2018).
- DAZARD J-E., RAO J.S. Variable Selection Strategies for High-Dimensional Survival Bump Hunting using Recursive Peeling Methods. (in prep 2018).
- DIAZ D.A., RAO J.S., DAZARD J-E. On the Explanatory Power of Principal Components. (in prep 2018). Archives of Cornell University Library
- ZHANG Z., DAZARD J-E., BEBEK G. N-Node Subnetwork Enumerating Algorithm (N-SEA) Identifies Lower Grade Glioma Subtypes with Altered Subnetworks and Distinct Prognostics (in prep 2018).
- DIAZ D.A., SAENZ J.P., RAO J.S., DAZARD J-E. Mode Hunting through Active Information. Applied Stochastic Models in Business and Industry (2018) in press. PMCID:pending.
- DAZARD J-E., ISHWARAN H., MEHLOTRA R.K., WEINBERG A., ZIMMERMAN P.A. Ensemble Survival Tree Models to Reveal Pairwise Interactions of Variable with Time-to-Events Outcomes in Low-Dimensional Setting. Statistical Applications in Genetics and Molecular Biology 2018 Feb 17;17(1). PMCID: PMC5844232
- STETSON D.M., DAZARD J-E., BARNHOLTZ-SLOAN J. Protein Markers Predict Survival in Glioma Patients. Mol. Cell. Proteomics 016 Jul;15(7):2356-65. PMCID: PMC4937509
- DAZARD J-E., CHOE M., LEBLANC M., RAO J.S. Cross-validation and Peeling Strategies for Survival Bump Hunting using Recursive Peeling Methods. Statistical Analysis and Data Mining (2016) 9(1):12-42. PMCID: PMC4809437.
- DAZARD J-E., CHOE M., LEBLANC M., RAO J.S. R package PRIMsrc: Bump Hunting by Patient Rule Induction Method for Survival, Regression and Classification. In JSM Proceedings, Statistical Programmers and Analysts Section. Seattle, WA, USA. American Statistical Association-IMS (2015) p. 650-664. PMCID: PMC4718587.
- DAZARD J-E., CHOE M., LEBLANC M., RAO J.S. Cross-Validation of Survival Bump Hunting using Recursive Peeling Methods. In JSM Proceedings, Survival Methods for Risk Estimation/Prediction Section. Boston, MA, USA: American Statistical Association-IMS (2014) p. 3366-3380. PMCID: PMC4795911.
- DAZARD J-E., SANDLERS Y., DOERNER S, BERGER N.A., BRUNENGRABER H. Metabolomics in APCMin/+ Mice Genetically Susceptible to Intestinal Cancer. BMC System Biology (2014) 8:72-93. PMCID: PMC4099115.
- DIAZ D.A., RAO J.S., DAZARD J-E. Optimization of the Patient Rule Induction Method (PRIM) under Normality. Complex Data Modeling and Computationally Intensive Statistical Methods for Estimation and Prediction (S.Co. 2013). Milan, Italy. PMCID: NA.
- SAHA S., DAZARD J-E., XU H., EWING R.M. Computational Framework for Analysis of Prey-Prey Associations in Interaction Proteomics Identifies Novel Human Protein-Protein Interactions and Networks. J. Proteome Research (2012) 11(9):4476-87. PMCID: PMC3777680
- DAZARD J-E., SAHA S., EWING R.M. ROCS: A Reproducibility Index and Confidence Score for Interaction Proteomics Studies. BMC Bioinformatics (2012) 13(1) 128. PMCID: PMC3568013.
- DAZARD J-E., RAO J.S. MARKOWITZ S. Local Sparse Bump Hunting Reveals Molecular Heterogeneity of Colon Tumors. Statistics in Medicine (2012) 31(11-12), 1203-1220. PMCID: PMC3668571.
- DAZARD J-E., RAO J.S. Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data. Comput. Statist. Data Anal. (2012) 56(7): 2317-2333. PMCID: PMC3375876.
- SCHLATZER D.M.*, DAZARD J-E.* (*equal), EWING R.M., ILCHENKO S., TOMECHEKO S.E., EID S., HO V., YANIK G., CHANCE M.R., COOKE K.R. Human Biomarker Discovery and Predictive Models for Disease Progression in Idiopathic Pneumonia Syndrome Following Allogeneic Stem Cell Transplantation. Mol. Cell. Proteomics (2012) 11(6): 1-15. PMCID: PMC3433920.
- DAZARD J-E., XU H., RAO J.S. R package MVR for Joint Adaptive Mean-Variance Regularization and Variance Stabilization. In JSM Proceedings, Statistical Programmers and Analysts Section. Miami Beach, FL, USA. American Statistical Association-IMS (2011) p. 3849-3863. PMCID: PMC4725579.
- DAZARD J-E., RAO J.S. Regularized Variance Estimation and Variance Stabilization of High-Dimensional Data. In JSM Proceedings, High-Dimensional Data Analysis and Variable Selection Section. Vancouver, BC, Canada. American Statistical Association-IMS (2010) p. 5295-5309. PMCID: PMC4727967.
- DAZARD J-E., RAO J.S. Local Sparse Bump Hunting. J. Comp Graph. Statistics (2010) 19(4): 900-929. PMCID: PMC3293195.
- CARTIER K., MISCIMARRA L., DAZARD J-E., SONG Y, IYENGAR S., RAO J. S. Studying Genetic Determinants of Natural Variation in Human Gene Expression Using Bayesian ANOVA. BMC Genetics (2007) 1:S115. PMCID: PMC2367590.
- DIAZ D.A., DAZARD J-E., RAO J.S. "Unsupervised Bump Hunting Using Principal Components". In: Ahmed SE, editor. Big and Complex Data Analysis: Methodologies and Applications. Contributions to Statistics, vol. Edited Refereed Volume. Cham Heidelberg New York: Springer. ISBN 978-3-319-41573-4. PMCID: pending. Archives of Cornell University Library.
- DAZARD J-E. (author/maintainer). R package IRSF: Interaction Random Survival Forest (2017). GitHub repository.
- DAZARD J-E. (author/maintainer), CHOE M., LEBLANC M, SANTANA A. R package PRIMsrc: Bump Hunting by Patient Rule Induction Method for Survival, Regression and Classification (2015). GitHub repository. Comprehensive R Archive Network (CRAN)
- DAZARD J-E. (author/maintainer). XU H., SANTANA A. R package MVR: Mean Variance Regularization (2011).GitHub repository: Comprehensive R Archive Network (CRAN).
- DAZARD J-E. (author/maintainer). R package ROCS: Reproducibility Index and Confidence Score for Interaction Proteomics Studies (in prep) GitHub repository: https://github.com/jedazard/ROCS
- DAZARD J-E. (author/maintainer). R package LSBH: Local Sparse Bump Hunting (in prep). GitHub repository: https://github.com/jedazard/LSBH