Data Science

# Undergraduate Applied Data Science Minor

## Undergraduate Applied Data Science Minor

To help bring the application of data science to a variety of fields, Case Western Reserve University has developed a unique Applied Data Science undergraduate minor that can be paired with any undergraduate major. The sequence launched in the fall of 2014.

This minor, based in the Case Western Reserve University School of Engineering, is open to all undergraduate majors: engineering, arts, sciences, nursing, and management. Students can choose from eight subdomains in which to concentrate their minor, all of which include a core curriculum that includes five 3-credit courses.

Domain areas available for minor concentration are:

• Engineering and Physical Sciences:
• Energy;
• Manufacturing; and
• Astronomy.
• Health:
• Translational; and
• Clinical.
• Finance;
• Marketing; and
• Economics.

The pathway towards earning the Applied Data Science minor is organized into five levels:

• DSCI 134
• DSCI 135

#### Level 2: Inferential Statistics [+]

• OPRE 207: Statistics for Business and Management Science I (3)
• Organizing and summarizing data. Mean, variance, moments. Elementary probability, conditional probability. Commonly encountered distributions including binomial. Poisson, uniform, exponential, normal distributions. Central limit theorem. Sample quantities, empirical distributions. Reference distributions (chi-square, z-, t-, F-distributions). Point and interval estimation: hypothesis tests.
• Prerequisites: MATH 122 or MATH 126.
• EPBI 431: Statistical Methods in Biological and Medical Sciences I
• STAT 312: Basic Statistics for Engineering and Science (3)
• For advanced undergraduate students in engineering, physical sciences, life sciences. Comprehensive introduction to probability models and statistical methods of analyzing data with the object of formulating statistical models and choosing appropriate methods for inference from experimental and observational data and for testing the model’s validity. Balanced approach with equal emphasis on probability, fundamental concepts of statistics, point and interval estimation, hypothesis testing, analysis of variance, design of experiments, and regression modeling.
• Note: Credit given for only one (1) of STAT 312, 313, 333, 433.
• Prerequisites: MATH 122 or equivalent.
• SYBB 310: Healthcare Data Analytics in R‌‌ (PDF)
• As part of the Data Science Minor, SYBB 310 is designed to introduce students to the basic tools used in data science, focusing on elementary statistics and building up to regression models. In this course, we will provide hands-on training in statistical programming through the use of the open-source statistical computing language, R. Over the semester, students will gain a practical understanding of the essential statistics needed for data science, and students will apply these principles using R to analyze a large dataset of 10,000 patients’ de-identifed electronic medical records. No background in statistics or programming is expected for this course.
• Undergraduate Prerequisites: EECS 131; or EECS 132; or equivalent proficiency
• STAT 201R (taught using R statistics software): Basic Statistics for Social and Life Sciences (3)
• Designed for undergraduates in the social sciences and life sciences who need to use statistical techniques in their fields. Descriptive statistics, probability models, sampling distributions. Point and confidence interval estimation, hypothesis testing. Elementary regression and analysis of variance.
• Not for credit toward major or minor in Statistics. Counts for CAS Quantitative Reasoning Requirement.

#### Level 3: Exploratory Applied Data Science [+]

• Engineering & Physical Sciences
• DSCI 351: Exploratory Data Science for Energy & Manufacturing‌ (PDF)
• Course Description: Data Sources, Data Assembly, and Exploratory Data Analytics. In this course, we will learn data science and analysis approaches applicable to energy and manufacturing technologies, to identify statistically significant relationships and better model and predict the behavior of these systems. We will assemble and explore real-world datasets, perform clustering and pair plot analysis to investigate correlations, and logistic regression will be employed to develop associated predictive models. Results will be interpreted, visualized and discussed.

We will introduce the basic elements of data science and analytics using R Project for Statistical Computing.  R is an open-source software project with broad abilities to access machine-readable open data resources, data cleaning and munging functions, and a rich selection of statistical packages, used for data analytics, model development, and prediction. This will include an introduction to R data types, reading and writing data, looping, plotting and regular expressions so that one can start performing variable transformations for linear fitting and developing structural equation models while exploring for statistically significant relationships.

R Analytics will be applied to the case of energy systems (such as PV power plant  degradation, and building energy efficiency) over time, by analyzing system responses, combined with results of experiments to identify fundamental principles that are statistically significant in the observed system performance.  And it will be applied to manufacturing systems to understand the principles of statistical process control and identify critical factors of variability and uniformity.
• Learning Outcomes:
• Familiarity with R Statistics, scripting, functions, packages, automated data analysis.
• Familiarity with exploratory data analysis, statistical model building
• Applications of domain knowledge and statistical analytics to identify important predictors and develop initial predictive models
• Dataset characteristics will include:
• Variety of types of information, including both, structured and unstructured data,
• Volume: Data from human sources (vendors, suppliers, distributors, customers, etc.) and sensor networks of the energy system of the factory,  both small and large data volumes.
• Velocity: Energy system and manufacturing supply chain changes will be included.
• Health
• SYBB 311: Survey of Bioinformatics (4)
• This course is offered as four separate 1-month long units (1 credit each). These courses are designed to take a student through the entire workflow of a bioinformatics research project - from data collection to data integration, to research applications. Graduate students can select specific units based on their needs. The overall course grade will be an average of the unit grades.
• Technologies in Bioinformatics;
• Data Integration in Bioinformatics;
• Translational Bioinformatics; and
• Programming for Bioinformatics.
• Course Unit Descriptions:
• 311/411A: Technologies in Bioinformatics: This course introduces students to the high-throughput technologies used to collect data for applications in genomics, proteomics, and metabolomics (e.g. mass spectrometry; gene sequencing; yeast-two-hybrid; microarrays).
• 311/411B: Data Integration in Bioinformatics: This course introduces students to the conceptual models used to integrate and interpret data collected by high-throughput technologies. These models range from knowledge organization structures (e.g. biomedical ontologies) to models of interaction (e.g. gene coexpression networks or protein interaction networks), as well as statistical concepts for dealing with such data.
• 311/411C: Translational Bioinformatics: This course introduces students to the clinical and real-world applications of bioinformatics, e.g. pharmacogenomics, GWAS of particular diseases, personalized medicine, systems medicine, microbiome analysis, etc. This course shows students how bioinformatic technologies and methods of data integration can be combined for various applications in biomedical research.
• 311/411D: Programming for Bioinformatics: This course will serve as a basic introduction to 1-2 programming languages, focusing on the applications, tools, and packages specifically related to bioinformatics. R, Python, Java, C++, and/or Perl may be taught as the instructor sees fit.
• SYBB 321: Clinical Informatics at the Bedside and the Bench Part I (3)
• This two-semester series provides students with an overview of the field of clinical informatics, focusing on the content areas outlined by the American Medical Informatics Association; the first semester will emphasize the use of informatics in clinical settings (i.e. "the bedside"), and the second semester will emphasize the use of informatics in public health, epidemiology, and translational bioinformatics (i.e. "the bench"). Through lectures, readings, and projects, students will learn to approach problems in clinical medicine through the lens of informatics, the science of information, with a focus on applications over theory. As clinical informatics revolves around the development and use of electronic medical records (EMRs), students will be familiarized with EMRs through a hands-on lab simulating clinical workflows.
• NUND403B
• MKMR 201
• This is an introductory marketing course designed to provide students with the concepts and theories necessary for understanding the fundamental principles of marketing and its role in any organization. Students will learn concepts such as marketing orientation, marketing-mix, relationship marketing and service logic, as well as behavioral theories of customer response and strategic frameworks for customer brand management. Students develop capabilities for understanding marketing issues in real world situations and to create and implement basic marketing plans.

• DSCI 352/452
• SYBB 387

#### Level 5: Modeling & Prognostics [+]

Engineering & Physical Sciences

Health