Data 401: Advanced Topics in Data Science

Alexander Dekhtyar and Dennis Sun, Cal Poly, Fall 2017


Course Info

  • Class Meetings: Mon, Wed in 14-256, 4:10-7 PM
  • Course Website: http://users.csc.calpoly.edu/~dsun09/data401

Staff

Prof. Alex Dekhtyar
  • Office: 14-210
  • Office Hours: Mon 11 AM-noon, Wed 9 AM-noon
Prof. Dennis Sun
  • Office: 25-105
  • In-Person Office Hours: Tues 4:10-5 PM, Wed 1:10-2 PM
  • Google Hangouts Office Hours: Sun 9:10-10 PM, Tues 7:30-8:20 PM
Teaching Assistant
  • Michael Halverson

Course Prerequisites and Goals

This is a rigorous class that will synthesize much of what you have learned in the past two years in the Data Science minor.

In particular, we will draw upon your existing knowledge in:

  • probability
  • linear algebra
  • regression models
  • algorithms
  • scientific computing.
  • This class is taught in Python. We will assume that you are already comfortable using Numpy, Pandas, and Matplotlib.

    By the end of this course, you will understand how the pieces fit together -- how statistics, computer science, and mathematics combine to make up the field of data science.

    Class Structure

    First, although some of you are enrolled in Dr. Dekhtyar's section and others are enrolled in Dr. Sun's, there is no difference between the two sections. You will all meet in a single section co-taught by both of us, and you will all be graded according to the same standards.

    Each class section is 3 hours long. We realize that this is long, so we will try to mix up the activities. There will be some lecture, some coding, some group work, some discussion, and some quizzes.

    Dr. Sun and Dr. Dekhtyar have different lecturing styles:

    • Dr. Sun will lecture from slides and provide handouts of the slides. The handouts are incomplete, so you may want to follow along and take notes. A completed version of each handout will be posted after the lecture, so you will not miss anything if you prefer to just listen, instead of taking notes.
    • Dr. Dekhtyar will lecture at the board from a detailed set of handouts, which he will provide.

    The schedule for every lecture can be found on the Lectures page.

    All coding will be done on JupyterHub. This means that you should not have to install any software to take this class. We will need to add your Github ID for you to be able to access JupyterHub, so please fill out the survey here.

    Communication Policy

    Because there are two sections, Dr. Sun does not have the ability to e-mail students in Dr. Dekhtyar's section, nor does Dr. Dekhtyar have the ability to e-mail students in Dr. Sun's. To make communication easier, we will post all announcements on the Piazza forum. We will assume that all of you have an account on Piazza, or you may miss important communications from us.

    Grading

    Your overall grade will be calculated according to the following weighting scheme.

    Weight
    3 Projects
    (best 2 count 15% each, lowest counts 5%)
    35%
    6 Quizzes
    (lowest score dropped)
    25%
    4 Writing Assignments
    (lowest score dropped)
    15%
    Review Paper 25%
    Overall 100%

    Recommended Books

    Goodfellow et al. Deep Learning. MIT Press, 2016.
    Available for free online.