DATA 301: Introduction to Data Science
Spring 2022

Instructor: Alexander Dekhtyar, dekhtyar@calpoly.edu, 14-210

Office Hours:
When
Who Where
Monday 9:10am - 10:00am Alex 14-212
Tuesday 9:10am - 10:00am Alex 14-212
Wednesday 9:10am - 11:00am Alex 14-212

Additional appoinments: send email.


News and Notes

Old News and Notes

Course Materials

Syllabus Postscript PDF
Dennis Sun's Texbook Github
Jupyter Labs Server https://dev2.csc.calpoly.edu:5000/

Labs

Lab 1 Due: April 4 Python quiz Instructions Python Notebook [March 29, 2022]
Lab 2 Due: April 7 Python Data Frames Python Notebook [April 7, 2022]
Lab 3 Due: April 12 Work with Data Frame columns Python Notebook [April 7, 2022]
Lab 4 Due: April 14 Visualizations and Variable Transformations Python Notebook 1
Python Notebook 2
[April 12, 2022]
Lab 5 Due: April 19 Visualizations and Variable Transformations Python Notebook 1
Python Notebook 2
[April 14, 2022]
Lab 6 Due: April 21 Data Cubes and Contingency Tables Python Notebook 1
Python Notebook 2
[April 19, 2022]
Lab 7 Due: April 26 Analysis of Numeric Variables Python Notebook 1
Python Notebook 2
[April 26, 2022]
Mini Project 1 Due: May Data Preparation and Warehousing Postscript PDF Superstore.csv [April 26, 2022]
Lab 8 Due: April 28 Analysis of Categorical Variables Python Notebook [April 28, 2022]
Lab 9 Due: May 3 Linear Regression Python Notebook [April 28, 2022]
Lab 10 Due: May 5 K-Nearest Neighbors Python Notebook [May 5, 2022]
Lab 11 Due: May 10 Evaluation of Regression Models Python Notebook 1
Python Notebook 2
[May 5, 2022]
Lab 12 Due: May 17 Hyperparameter Tuning and Model Selection Python Notebook 1 [May 12, 2022]
Lab 13 Due: May 19 Classification Python Notebook 1
Python Notebook 2
[May 19, 2022]
Lab 14 Due: May 26 Clustering Python Notebook 1 [May 19, 2022]
Lab 15 Due: May 28 Vectorization, TF-IDF Python Notebook 1
Python Notebook 2
[May 23, 2022]
Lab 16 Due: May 31 Visualization Python Notebook 1
Python Notebook 2
Python Notebook 3
[May 26, 2022]
Mini Project 2 Due: June Data Science in action Postscript PDF [May 26, 2022]

Day by Day

DoWDateLecture Notebooks Lab Materials Other
Tuesday March 29 Syllabus Lab quiz (Lab 1) Lecture 1, Lecture 2
Tuesday April 5 Data Science Process Chapter 1.1
titanic.csv
tips.csv
Complete Chapter 1.1 exercises Lecture 1, Lecture 2 Original Chapter 1.1. notebook
Thursday April 7 Data Acquisition Chapter 1.2
titanic.csv
tips.csv
Complete Chapter 1.2 exercises Lecture 3, Original Chapter 1.2. notebook
Tuesday April 12 Simple Visualization
Data Manipulation/Feature Engineering
Chapter 1.3
Chapter 1.4
titanic.csv
tips.csv
AmesHousing.txt
Complete Chapter-1.3
and Chapter-1.4exercises
Tabular Data Original Chapter 1.3. notebook
Chapter 1.4. notebook
Thursday April 14 Grouping and Aggregation Chapter 2.1
Chapter 2.2
titanic.csv
tips.csv
Complete Chapter 2.1,
Chapter 2.2, exercises
Original Chapter 2.1, Chapter 2.2,
notebooks
Tuesday April 19 Data Cubes and Analysis of variables Chapter 2.3
Chapter 3.2

titanic.csv
tips.csv
Complete Chapter2.3
and Chapter3.2
Original Chapter 2.3. notebook
Chapter 3.2. notebook,
Thursday April 21 Analysis of Numeric Variables Chapter 3.1
Chapter 4.1
titanic.csv
tips.csv
reds.csv
Complete Chapter 3.1,
Chapter 4.1, exercises
Original Chapter 3.1, Chapter 4.1,
notebooks
Tuesday April 26 Analysis of Categorical Variables Chapter 4.2

titanic.csv
tips.csv
Complete Chapter4.2
Mini-project 1
Original Chapter 4.1. notebook
Thursday April 28 Linear Regression Chapter 5.1
AmesHousing.txt
reds.csv
Complete Chapter 5.1 exercises
Tuesday May 3 KNN Regression Chapter 5.2

AmesHousing.txt
tips.csv
Complete Chapter5.2 exercises
Thursday May 5 Evaluation of Regression Methods Chapter 5.3
Chapter 5.4
AmesHousing.txt

tips.csv
Complete Chapter 5.3
and Chapter 5.4 exercises
Tuesday May 10 Lab Test
Thursday May 12 Hyperparameter Tuning, Model Selection, Ensembles Chapter 5.5
Chapter 5.5
AmesHousing.txt
Complete Chapter 5.5
Tuesday May 17 KNN Classification Chapter 6.1
Chapter 6.1
reds.csv
whites.csv
titanic.csv
Complete Chapter6.1
Chapter6.2 exercises
Lecture Notes
Thursday May 19 Classification: KNN Classifier Chapter 7.1
iris.csv
two_moons.csv
satellite.csv
reds.csv
whites.csv
titanic.csv
Complete Chapter7.1 exercises Lecture Notes
Tuesday May 23 Work with Text Collections, Vectorization Chapter 13.1
Chapter 13.2
SMSSpamCollection.txt
profiles.csv
greeneggsandham.txt
Complete Chapter13.1
Chapter13.2 exercises
Lecture Notes
Thursday May 26 Visualization of Data Chapter 10.1
Chapter 10.2
Chapter 10.3
titanic.csv
AmesHousing.txt
Complete Chapter10.1
Chapter10.2
Chapter10.3 exercises

Assigned Reading

Homeworks

Lecture Notes

Lecture 1 What is Data Science? Postscript PDF [March 28, 2016]
Lecture 2 Data Science Process Postscript PDF [April 3, 2016]
Lecture 3 Data Acquisition Postscript PDF [April 3, 2016]
Lecture 4 Tabular Data Postscript PDF [April 3, 2016]
Lecture 5 Textual Data Postscript PDF [April 5, 2016]
Lecture 6 XML Data Postscript PDF [April 11, 2016]
Lecture 7 Document Object Model (DOM) Postscript PDF [April 11, 2016]
Lecture 8 HTML and Beautiful Soup Postscript PDF [April 20, 2016]
Lecture 9 Maps and JSON Postscript PDF [April 20, 2016]
Lecture 14 Recommendation Predictions Postscript PDF [May 11, 2016]
Lecture 15 Supervised Learning (Classification) Postscript PDF [May 18, 2016]
Lecture 16 Unsupervised Learning (Clustering) Postscript PDF [May 23, 2016]


Other Materials


March 29, 2022, dekhtyar at csc.calpoly.edu