• This web page describes an activity within the Department of Mathematics at Ohio University, but is not an official university web page.
• If you have difficulty accessing these materials due to visual impairment, please email me at mohlenka@ohio.edu; an alternative format may be available.

MATH 2530X Foundations of Data Science, Fall 2021

Section 100 lecture (class number 15173) and section 101 laboratory (class number 15174)

Syllabus

Acknowledgment:
This class is a clone of the Foundations of Data Science course at the University of California at Berkeley. Many thanks to them for sharing their materials. See the extensive information about their course for the reasons you should take this course.
Catalog Description:
Foundations of data science from three perspectives: inferential thinking, computational thinking, and real-world relevance. Given data arising from some real-world phenomenon, how does one analyze that data so as to understand that phenomenon? The course teaches critical concepts and skills in computer programming and statistical inference, in conjunction with hands-on analysis of real-world datasets, including economic data, document collections, geographical data, and social networks. It delves into social and legal issues surrounding data analysis, including issues of privacy and data ownership. [source]
Desired Learning Outcomes:
• Students will be able to manipulate data to extract subsets, sort, and compute derived quantities.
• Students will be able to manipulate data to produce visualizations.
• Students will be able to compute statistical quantities from a real-world dataset.
• Students will be able to sample from a dataset and construct empirical confidence intervals from it.
• Students will be able to perform bootstrap resampling from a dataset and construct empirical confidence intervals.
• Students will be able to perform linear regression and inference from it.
• Students will be able to perform classification.
• Students will be able to state the privacy, ownership, and ethical issues arising in data science.
Prerequisites:
(MATH 1060, 1090, 1101, 1200, 1250, 1260, 1322, or 1500) or (Math placement 2 or higher)
Web page:
http://www.ohiouniversityfaculty.com/mohlenka/2221/2530X/.
Class hours/ location:
Lecture 3:05 PM to 4:00 PM Monday, Wednesday, and Friday. Laboratory 1:30 PM to 2:50 PM Tuesday. All meetings are in Morton Hall room 314, which is a teaching computer lab.
Instructor:
Martin J. Mohlenkamp, mohlenka@ohio.edu, Morton Hall 321C. The bests ways to contact me are email or Teams.
Office hours:
**TBD***
Interaction environment:
We **will** have a class Teams team. The main activities there are in channels:
Content questions:
As questions here so that others get the benefit of your questions too. Respond to others' questions to help them figure it out. (Posting solutions to homework questions is not being helpful.) I will answer in the channel so that everyone gets the benefit.
Office hours:
Along with in person office hours, I will run them as Teams meetings.
Text:
Computational and Inferential Thinking: The Foundations of Data Science., Ani Adhikari and John DeNero. Available (for free) at https://inferentialthinking.com/.
Computing Environment:
We will be using the Python programming language within Jupyter Notebooks. It is **TBD** whether we will run these notebooks using CoCalc, a JupyterHub, or the Ohio Supercomputer Center
Lecture attendance:
You are allowed 5 absences (out of 41 classes) without penalty; these include university excused absences for illness, death in the immediate family, religious observance, jury duty, or involvement in University-sponsored activities. Each additional absence will reduce your final average by 0.5%. Your attendance record will be available in Blackboard.
Homework:
There is a homework assignment due in most weeks. Late homework is penalized 10% per day (or part thereof) late. Your lowest two scores are dropped.
Labs:
In 10 of the weeks, the Tuesday lab meeting will be used for (computer) laboratories, to be completed during that time. Missed labs cannot be made up, but your lowest score is dropped.
Projects:
There are three substantial, multi-week projects during the semester. Some time in the Tuesday lab meetings will be used to work on projects. You can do your project alone or with one partner.
Tests and Exam:
There will be one mid-term test. The final exam is on **TBD**.
Your grade is based on the labs 10%, the homework 20%, the projects 25%, the midterm test 15%, and the final exam 30%. An average of 90% guarantees you at least an A-, 80% a B-, 70% a C-, and 60% a D-.
You are allowed to use most resources, but there are some limitations.
Unlimited use, without specific acknowledgment:
• The textbook.
• Discussions with me.
• Your partner, for the project.
• Websites on statistics, data science, etc.
• Explanations by other students in this class.
• Explanations by friends, roommates etc.
Acknowledge and describe this help in writing on the problem where it was used. For example, you might write "[Name] explained to me how to do [some part] of this problem" or "I found an explanation of [concept] at the website [url]".
Forbidden:
• The work or programs from students who took this class (in any of its versions at any university).
• Websites that claim to have homework, lab, or project solutions for this class.
• Direct copying.
If you are not sure if something is allowed, then ask me first.
A minor, first-time violation of this policy will receive a warning and discussion and clarification of the rules. Serious or second violations will result in a grade penalty on the assignment. Very serious or repeated violations will result in failure in the class and be reported to the Office of Community Standards and Student Responsibility, which may impose additional sanctions. You may appeal any sanctions through the grade appeal process.
Special Needs:
If you have specific physical, psychiatric, or learning disabilities and require accommodations, please let me know as soon as possible so that your learning needs may be appropriately met. You should also register with Student Accessibility Services to obtain written documentation and to learn about the resources they have available.
Responsible Employee Reporting Obligation:
If I learn of any instances of sexual harassment, sexual violence, and/or other forms of prohibited discrimination, I am required to report them. If you wish to share such information in confidence, then use the Office of Equity and Civil Rights Compliance.

Schedule (Subject to change)

DRAFT!! Subject to change. Textbook links are active now; other links will become active nearer their date.

Week Date Topic Textbook Assignment
1
Mon Aug 23 Introduction 1.1, 1.2, 1.3
Tues Aug 24 Lab 01: Expressions
Wed Aug 25 Cause and Effect 2
Fri Aug 27 Tables 3
2
Mon Aug 30 Data Types 4, 5 Homework 01 due
Tues Aug 31 Lab 02: Table Operations
Wed Sep 1 Building Tables 6.2
Fri Sep 3 Census 6.3, 6.4
3
Mon Sep 6 Labor day holiday, no class
Tues Sep 7 Lab 03: Data Types and Creating & Extending Tables
Wed Sep 8 Charts 7, 7.1 Homework 02 due
Fri Sep 10 Histograms 7.2, 7.3
4
Mon Sep 13 Functions 8, 8.1 Homework 03 due
Tues Sep 14 Lab 04: Functions and Visualizations
Wed Sep 15 Groups 8.2, 8.3
Fri Sep 17
5
Mon Sep 20 Joins 8.4
Tues Sep 21 Start working on Project 1: World Progress
Wed Sep 22 Table Examples 8.5
Fri Sep 24 Iteration 9, 9.1, 9.2, 9.3 Project 1 checkpoint due
6
Mon Sep 27 Chance 9.5, 18.1
Tues Sep 28
Wed Sep 29 Sampling 10, 10.1, 10.2
Fri Oct 1 Fall break, no class
7
Mon Oct 4 Models 10.3, 11.1
Tues Oct 5 Practice midterm test
Wed Oct 6 Comparing Distributions 11.1, 11.2 Project 1 due
Fri Oct 8 Decisions and Uncertainty 11.3
8
Mon Oct 11 A/B Testing 11.4, 12.1, 12.2
Tues Oct 12
Wed Oct 13 Causality 12.3
Fri Oct 15 Examples 12.2
9
Mon Oct 18
Tues Oct 19 Midterm test
Wed Oct 20 Bootstrap 13, 13.1, 13.2
Fri Oct 22 Confidence Intervals 13.3, 13.4
10
Mon Oct 25 Interpreting Confidence Intervals 14, 14.1, 14.2
Tues Oct 26 Start working on Project 2: Cardiovascular Disease
Wed Oct 27 Center and Spread 14.3, 14.4
Fri Oct 29 The Normal Distribution and Sample Means 14.5
11
Mon Nov 1 Designing Experiments 14.6
Tues Nov 2
Wed Nov 3 Correlation 15, 15.1 Project 2 checkpoint 1 due
Fri Nov 5 Linear Regression 15.2
12
Mon Nov 8 Least Squares 15.3, 15.4
Tues Nov 9
Wed Nov 10 Residuals 15.5, 15.6 Project 2 checkpoint 2 due
Fri Nov 12 Regression Inference 16
13
Mon Nov 15 Regression Wrapup 16
Tues Nov 16 Start working on Project 3: Classifying Movies
Wed Nov 17 Classification 17, 17.1, 17.2, 17.3 Project 2 due
Fri Nov 19 Classifiers 17.4
14
Mon Nov 22 Decisions 18 Project 3 checkpoint due
Tues Nov 23
Wed Nov 24 Thanksgiving holiday, no class
Fri Nov 26 Thanksgiving holiday, no class
15
Mon Nov 29
Tues Nov 30
Wed Dec 1 Project 3 due
Fri Dec 3
16
**TBD** Dec **TBD** Final Exam **TBD** in our regular classroom.

Martin J. Mohlenkamp