# Undergraduate Applied Data Science Minor

To help bring the application of data science to a variety of fields, Case Western Reserve University has developed a unique Applied Data Science undergraduate minor that can be paired with any undergraduate major. The sequence launched in the fall of 2014.

This minor, based in the Case Western Reserve University School of Engineering, is open to all undergraduate majors: engineering, arts, sciences, nursing, and management. Students can choose from eight subdomains in which to concentrate their minor, all of which include a core curriculum that includes five 3-credit courses.

Domain areas available for minor concentration are:

• Engineering and Physical Sciences:
• Energy;
• Manufacturing; and
• Astronomy.
• Health:
• Translational; and
• Clinical.
• Finance;
• Marketing; and
• Economics.

The pathway towards earning the Applied Data Science minor is organized into five levels:

• ENGR 131: Elementary Computer Programming
• EECS 132: Introduction to Programming in Java
• DSCI 133*: Introduction to Data Science and Engineering for Majors
• DSCI 134*: Introduction to Applied Data Science
• OPRE 207: Statistics for Business and Management Science I (3)
• Organizing and summarizing data. Mean, variance, moments. Elementary probability, conditional probability. Commonly encountered distributions including binomial. Poisson, uniform, exponential, normal distributions. Central limit theorem. Sample quantities, empirical distributions. Reference distributions (chi-square, z-, t-, F-distributions). Point and interval estimation: hypothesis tests.
• Prerequisites: MATH 122 or MATH 126.
• PQHS 431: Statistical Methods in Biological and Medical Sciences I
• STAT 201R (taught using R statistics software): Basic Statistics for Social and Life Sciences (3)
• Designed for undergraduates in the social sciences and life sciences who need to use statistical techniques in their fields. Descriptive statistics, probability models, sampling distributions. Point and confidence interval estimation, hypothesis testing. Elementary regression and analysis of variance.
• Not for credit toward major or minor in Statistics. Counts for CAS Quantitative Reasoning Requirement.
• STAT 312R (taught using R statistics software): Basic Statistics for Engineering and Science (3)
• For advanced undergraduate students in engineering, physical sciences, life sciences. Comprehensive introduction to probability models and statistical methods of analyzing data with the object of formulating statistical models and choosing appropriate methods for inference from experimental and observational data and for testing the model’s validity. Balanced approach with equal emphasis on probability, fundamental concepts of statistics, point and interval estimation, hypothesis testing, analysis of variance, design of experiments, and regression modeling.
• Note: Credit given for only one (1) of STAT 312, 313, 333, 433.
• Prerequisites: MATH 122 or equivalent.

Engineering & Physical Sciences, Health and Business (All Domain areas)

• DSCI 351: Exploratory Data Science for Energy & Manufacturing
• Course Description: Data Sources, Data Assembly, and Exploratory Data Analytics. In this course, we will learn data science and analysis approaches applicable to energy and manufacturing technologies, to identify statistically significant relationships and better model and predict the behavior of these systems. We will assemble and explore real-world datasets, perform clustering and pair plot analysis to investigate correlations, and logistic regression will be employed to develop associated predictive models. Results will be interpreted, visualized and discussed.

We will introduce the basic elements of data science and analytics using R Project for Statistical Computing.  R is an open-source software project with broad abilities to access machine-readable open data resources, data cleaning and munging functions, and a rich selection of statistical packages, used for data analytics, model development, and prediction. This will include an introduction to R data types, reading and writing data, looping, plotting and regular expressions so that one can start performing variable transformations for linear fitting and developing structural equation models while exploring for statistically significant relationships.

R Analytics will be applied to the case of energy systems (such as PV power plant  degradation, and building energy efficiency) over time, by analyzing system responses, combined with results of experiments to identify fundamental principles that are statistically significant in the observed system performance.  And it will be applied to manufacturing systems to understand the principles of statistical process control and identify critical factors of variability and uniformity.
• Learning Outcomes:
• Familiarity with R Statistics, scripting, functions, packages, automated data analysis.
• Familiarity with exploratory data analysis, statistical model building
• Applications of domain knowledge and statistical analytics to identify important predictors and develop initial predictive models
• Dataset characteristics will include:
• Variety of types of information, including both, structured and unstructured data,
• Volume: Data from human sources (vendors, suppliers, distributors, customers, etc.) and sensor networks of the energy system of the factory,  both small and large data volumes.
• Velocity: Energy system and manufacturing supply chain changes will be included.

Engineering and Physical Sciences

• ASTR 306: Astronomical Techniques
• DSCI 353: Data Science: Statistical Learning, Modeling and Prediction
• DSCI 330/430 Cognition & Computation

Health

Engineering & Physical Sciences, Health and Business (All Domain areas)

• DSCI 352: Applied Data Science Research
• SYBB 387: Undergraduate Research in Systems Biology

Harnessing the resources of faculty expertise in materials science, electrical engineering and computer science, mechanical and aerospace engineering, systems biology, design and innovation, economics, finance and astronomy, the Applied Data Science minor teaches essential tools and applications within each domain area. This includes:

• data management: Datastores, sources, streams
• distributed computing: local and distributed computing (including Hadoop and other cloud computing)
• informatics, ontology, query: including search, data assembly and annotation
• statistical analytics: including tools such as high-level scripting languages such as R statistics, Python and Ruby

View a presentation on the making of the ADS program given to the Business Higher Education Forum: Crafting a Minor to Produce T-Shaped Graduates