CSC 369: Distributed Computing
Spring 2020

Instructor: Alex Dekhtyar,, 14-210

Office Hours:
Who Where
Monday 1:10pm - 2:00pm Alex Our Lecture Zoom
Tuesday 1:10pm - 3:00pm Alex Office Hours Zoom
Friday 1:10pm - 2:00pm Alex Our Lecture Zoom

Note: Zoom links are not shown on this page to prevent web crawlers from finding them. They were emailed to you, and will be available upon an email or Slack request

Additional appoinments: send email.

News and Notes

Old News and Notes

Course Materials

Syllabus Postscript PDF
Canvas site Canvas
Cal Poly Zoom HTML
CSC 369 Slack Channel HTML
Campus VPN Instructions MacOS and Windows Linux (.txt)
Survey 2 Google Form


Lab 1 Due: April 13 JSON Manipulation Postscript PDF Lab Data (daily.json) Source [April 5, 2020]
Lab 2 Due: April 17 MongoDB: first steps Postscript PDF Lab Data (daily.json) Source [April 13, 2020]
Lab 3 Due: April 24 MongoDB aggregation pipelines Postscript PDF [April 21, 2020]
Lab 4 Due: May 10 MongoDB Application PDF Lab Info [April 28, 2020]
Lab 5 Due: May 6 First Hadoop Program Postscript PDF Lab Info [May 4, 2020]
Lab 6 Due: May 18 Hadoop Programs Postscript PDF [May 8, 2020, 2019]
Lab 7 Due: June 5 June 9 Hadoop Research Mini-project Postscript PDF [May 20, 2020]
Lab 8-1 Due: June 9 Spark PDF [May 30, 2020]
Lab 8-1 Due: June 9 Spark Data Frames PDF README [June 4, 2020]
Optional Mini-Project Due: June 14 (noon) Distribued Computing with Real Data PDF [June 6, 2020]

JSON Resources

JSON home page/List of JSON Java, Python libraries: scroll to the bottom of the page
JSON specification ECMA-404: The JSON Data Interchange Format (PDF)
org.json Javadocs Javadoc
Local copy of org.JSON library org.json-20120521.jar Put in the same directory as your code for now

Sample Code

org.json JSONArray demo
data file for p.json
bash script for of jsonArra
Reading from a JSON array file one object at a time

MongoDB Resources


April 4, 2020 General welcome, and MongoDB login process Slides

MongoDB Documentation
MongoDB 4.2 Documentation HTML
mongo shell HTML
Create, Read, Update, Delete (CRUD) HTML
db..find() HTML
Aggregation Pipeline Overview, db.collection.aggregate() Aggregation pipeline stages

Python MongoDB API

PyMongo 3.10.1 Documentation HTML
Tutorial HTML
Authentication example HTML
Aggregation pipeline example HTML
Additional examples HTML

Code and Queries

MongoDB Python API example example.out ( output)

Hadoop Resources

Hadoop Resources and code is posted here.

Hadoop Cluster Monitor

Monitor hadoop jobs here


The Original MapReduce paper PDF
org.apache.hadoop Version 3.2.1. javadocs API
org.apache.hadoop Version 2.7 Jar file hadoop-core-1.2.1.jar
Bash local variable settings bashrc-commands.txt Paste into the bottom of your .bashrc file
MapReduce (Hadoop v. 2.7) tutorial HTML


Code samples discussed in class are posted here

Hadoop program template
Our first Hadoop program
Data file for data.csv
Input Format Tests
TextInputFormat test
KeyValueTextInputFormat test
FixedRecordInputFormat test
NLineInputFormat test
NLineInputFormat test
One-mapper/One-reducer version of
Multiple chained MapReduce jobs words (input file)
Multiple Input Files/Multiple Mappers, (input files)
Map-Side Join with Distributed Cache, (input files)
Use of JSON
Using JSON objects,simple.json (input files)
Multiline JSON test.json (input file)
Multiline JSON Input Format json-mapreduce-1.0.jar
Advanced Hadoop Features
Finding Max numbers.txt
Combiner Test: graph scan with no Combiner
Combiner Test: graph scan with Combiner

Spark Resources

Spark resources and Spark code discussed in class will go here


The Original Spark paper (USENIX Cloud Computing'2010) PDF
Resilient Distributed Datasets (USENIX NSDI'2012) PDF
PySpark Documentation (version 2.4.5) HTML
Wienqiang Feng: Learning Apache Spark with Python PDF
Running PySpark Applications on Googledoc
PySpark RDD API annotated Googledoc


In-class Example (March 1 lecture)
Use of Hadoop Files



Note: Recordings are available on the Canvas page.
April 6 (Monday) Introduction Slides (PDF) Notes (PDF)
April 8 (Wednesday) Key Value stores Slides (PDF) Notes #1(PDF), Notes #2(PDF)
April 10 (Friday) Distributed DBMS/The CAP Theorem Slides (PDF) Notes (PDF)
April 13 (Monday) MongodDB Basics Slides (PDF) Notes (PDF)
April 15 (Wednesday) Problem Decomposition, Data Manipulation Algebra Slides (PDF)
April 17 (Friday) MongodDB Aggregation Pipeline Slides (PDF) Notes (PDF)
April 20 (Monday) MongodDB Aggregation Pipeline Slides (PDF) Notes (PDF)
April 22 (Wednesday) MongodDB Aggregation Pipeline (continued) Slides (PDF)
April 24 (Friday) Quiz 1
April 27 (Monday) Lab Exam 1
April 29 (Wednesday) Overview of Distributed Systems Slides (PDF) Notes (PDF)
May 1 (Friday) MapReduce Slides (PDF) Notes (PDF)
May 4 (Monday) Introduction to Hadoop Slides (PDF) Notes #1(PDF), Notes #2(PDF)
May 6 (Wednesday) Hadoop Input Data Types Slides (PDF) Notes (PDF)
May 8 (Monday) Midterm postmortem, Hadoop API Slides (PDF)
May 11 (Monday) Joins in MapReduce Slides (PDF) Notes (PDF)

Lecture Notes

Lecture 1 What's in this class? Postscript PDF [January 4, 2016]
Lecture 2 Motivating Examples Postscript PDF [January 4, 2016]
Lecture 2-1 JSON Postscript PDF [January 10, 2017]
Lecture 3-1 Distributed Databases and The CAP Theorem Postscript PDF [April 12, 2020]
Lecture 3-2 Maps, Dictionaries, Key-Value Pairs Postscript PDF [January 12, 2016]
Lecture 4 MongoDB Basics Postscript PDF [January 18, 2016]
Lecture 5 MongoDB Java Connectivity Postscript PDF [January 28, 2016]
Lecture 6 MongoDB Aggregation Pipeline Postscript PDF [January 27, 2017]
Lecture 7 MongoDB Aggregation Pipeline: Part 2 Postscript PDF [Feb 3, 2017]
Lecture 8 Overview of Distributed Systems Postscript PDF [February 4, 2017]
Lecture 9 MapReduce Postscript PDF [January 28, 2016]
Lecture 10 Hadoop on our cluster Postscript PDF [February 4, 2019]
Lecture 11 HDFS commands primer Postscript PDF [February 13, 2017]
Lecture 12 Hadoop Input Data Formats Postscript PDF [February 21, 2017]
Lecture 13 Joins in MapReduce Postscript PDF [May 11, 2020]
Lecture 14 Matrix Multiplication in MapReduce Postscript PDF [March 5, 2017]
Lecture 15 MapReduce for Top K Problem Postscript PDF [March 10, 2017]
Lecture 16 Resilient Distributed Datasts Postscript PDF [February 28, 2019]

Other Materials


JSON home
JSON specification ECMA-404: The JSON Data Interchange Format (PDF)
org.json Javadocs Javadoc

January 7, 2019, dekhtyar at