CSC 369: Distributed Computing
Winter 2017

Instructor: Alexander Dekhtyar, dekhtyar@calpoly.edu, 14-210

Office Hours:
When
Who Where
Wednesday 8:30am - 10:00am Alex 14-215
Thursday 1:10pm - 2:00pm Alex 14-215
Friday 8:30am - 10:00am Alex 14-215

Additional appoinments: send email.


News and Notes

Old News and Notes

Course Materials

Syllabus Postscript PDF

Labs

Lab 1, Part 1 Due: January 13 JSON Generation Postscript PDF Lab Data [January 10, 2017]
Lab 1, Part 2 Due: January 20 JSON Generation Postscript PDF Lab Data [January 13, 2017]
Lab 2 Due: January 25 MongoDB find() queries Postscript PDF [January 23, 2017]
Lab 4 Due: February 1 MongoDB Aggregate Pipelines Postscript PDF [January 27, 2017]
Lab 5 Due: February 10 MongoDB application Postscript PDF Lab Info [February 4, 2017]
Lab 6 Due: February 17 Getting Hadoop to Work Postscript PDF Lab Info [February 15, 2017]
Lab 7 Due: February 24 Simple Hadoop Programs Postscript PDF Lab Info [February 17, 2017]
Lab 8 Due: March 3 Medium-difficulty Hadoop Programs Postscript PDF [February 27, 2017]
Lab 9 Due: March 19 Real Data Hadoop Programs Postscript PDF Lab Info [March 5, 2017]

MongoDB

Queries
January 30, February 1 mongo-queries.txt st collection prof collection

Hadoop

Resources

org.apache.hadoop Version 2.7 javadocs API
Bash local variable settings bashrc-commands.txt Paste into the bottom of your .bashrc file
MapReduce (Hadoop v. 2.7) tutorial HTML

Code

Hadoop program template template.java
Our first Hadoop program switchMR.java
Data file for switchMR.java data
Input Format Tests
TextInputFormat test FITest.java
KeyValueTextInputFormat test KeyValueTest.java
FixedRecordInputFormat test FixedRecordTest.java
NLineInputFormat test NLTest.java
Multiple chained MapReduce jobs filter.java words (input file)
Multiple Input Files/Multiple Mappers multiInMR.java users.in, messages.in (input files)
Use of JSON
Using JSON objects JsonJob.java json.in,simple.json (input files)
Multiline JSON MultilineJsonJob.java test.json (input file)
Multiline JSON Input Format json-mapreduce-1.0.jar
Advanced Hadoop Features
Finding Max FindMax.java numbers.txt
Map-Side Join with Distributed Cache dCacheDemo.java users.in, messages.in (input files)
Combiner Test: graph scan with no Combiner TwitterTest.java
Combiner Test: graph scan with Combiner CombinerTest.java

Homeworks

Lecture Notes

Lecture 1 What's in this class? Postscript PDF [January 4, 2016]
Lecture 2 Motivating Examples Postscript PDF [January 4, 2016]
Lecture 3 Maps, Dictionaries, Key-Value Pairs Postscript PDF [January 12, 2016]
Lecture 3-1 JSON Postscript PDF [January 10, 2017]
Lecture 4 MongoDB Basics Postscript PDF [January 18, 2016]
Lecture 5 MongoDB Java Connectivity Postscript PDF [January 28, 2016]
Lecture 6 MongoDB Aggregation Pipeline Postscript PDF [January 27, 2017]
Lecture 7 MongoDB Aggregation Pipeline: Part 2 Postscript PDF [Feb 3, 2017]
Lecture 8 Overview of Distributed Systems Postscript PDF [February 4, 2017]
Lecture 9 MapReduce Postscript PDF [January 28, 2016]
Lecture 10 Hadoop on CSLVM cluster Postscript PDF [February 17, 2016]
Lecture 11 HDFS commands primer Postscript PDF [February 13, 2017]
Lecture 12 Hadoop Input Data Formats Postscript PDF [February 21, 2017]
Lecture 14 Matrix Multiplication in MapReduce Postscript PDF [March 5, 2017]
Lecture 15 MapReduce for Top K Problem Postscript PDF [March 10, 2017]


Other Materials

JSON

JSON home pagejson.org
JSON specification ECMA-404: The JSON Data Interchange Format (PDF)
org.json Javadocs Javadoc

January 10, 2017, dekhtyar at csc.calpoly.edu