Instructor: Alexander Dekhtyar, dekhtyar@calpoly.edu, 14-210
Office Hours:
| Who | Where | |
Wednesday | 8:30am - 10:00am | Alex | 14-215 |
Thursday | 1:10pm - 2:00pm | Alex | 14-215 |
Friday | 8:30am - 10:00am | Alex | 14-215 |
Additional appoinments: send email.
Syllabus | Postscript |
Lab 1, Part 1 | Due: January 13 | JSON Generation | Postscript | Lab Data | [January 10, 2017] | |
Lab 1, Part 2 | Due: January 20 | JSON Generation | Postscript | Lab Data | [January 13, 2017] | |
Lab 2 | Due: January 25 | MongoDB find() queries | Postscript | [January 23, 2017] | ||
Lab 4 | Due: February 1 | MongoDB Aggregate Pipelines | Postscript | [January 27, 2017] | ||
Lab 5 | Due: February 10 | MongoDB application | Postscript | Lab Info | [February 4, 2017] | |
Lab 6 | Due: February 17 | Getting Hadoop to Work | Postscript | Lab Info | [February 15, 2017] | |
Lab 7 | Due: February 24 | Simple Hadoop Programs | Postscript | Lab Info | [February 17, 2017] | |
Lab 8 | Due: March 3 | Medium-difficulty Hadoop Programs | Postscript | [February 27, 2017] | ||
Lab 9 | Due: March 19 | Real Data Hadoop Programs | Postscript | Lab Info | [March 5, 2017] |
January 30, February 1 | mongo-queries.txt | st collection | prof collection |
org.apache.hadoop Version 2.7 javadocs | API | |
Bash local variable settings | bashrc-commands.txt | Paste into the bottom of your .bashrc file |
MapReduce (Hadoop v. 2.7) tutorial | HTML |
Hadoop program template | template.java | |
Our first Hadoop program | switchMR.java | |
Data file for switchMR.java | data | |
Input Format Tests | ||
---|---|---|
TextInputFormat test | FITest.java | |
KeyValueTextInputFormat test | KeyValueTest.java | |
FixedRecordInputFormat test | FixedRecordTest.java | |
NLineInputFormat test | NLTest.java | |
Multiple chained MapReduce jobs | filter.java | words (input file) |
Multiple Input Files/Multiple Mappers | multiInMR.java | users.in, messages.in (input files) |
Use of JSON | ||
Using JSON objects | JsonJob.java | json.in,simple.json (input files) |
Multiline JSON | MultilineJsonJob.java | test.json (input file) |
Multiline JSON Input Format | json-mapreduce-1.0.jar | Advanced Hadoop Features |
Finding Max | FindMax.java | numbers.txt |
Map-Side Join with Distributed Cache | dCacheDemo.java | users.in, messages.in (input files) |
Combiner Test: graph scan with no Combiner | TwitterTest.java | |
Combiner Test: graph scan with Combiner | CombinerTest.java |
Lecture 1 | What's in this class? | Postscript | [January 4, 2016] | |
Lecture 2 | Motivating Examples | Postscript | [January 4, 2016] | |
Lecture 3 | Maps, Dictionaries, Key-Value Pairs | Postscript | [January 12, 2016] | |
Lecture 3-1 | JSON | Postscript | [January 10, 2017] | |
Lecture 4 | MongoDB Basics | Postscript | [January 18, 2016] | |
Lecture 5 | MongoDB Java Connectivity | Postscript | [January 28, 2016] | |
Lecture 6 | MongoDB Aggregation Pipeline | Postscript | [January 27, 2017] | |
Lecture 7 | MongoDB Aggregation Pipeline: Part 2 | Postscript | [Feb 3, 2017] | |
Lecture 8 | Overview of Distributed Systems | Postscript | [February 4, 2017] | |
Lecture 9 | MapReduce | Postscript | [January 28, 2016] | |
Lecture 10 | Hadoop on CSLVM cluster | Postscript | [February 17, 2016] | |
Lecture 11 | HDFS commands primer | Postscript | [February 13, 2017] | |
Lecture 12 | Hadoop Input Data Formats | Postscript | [February 21, 2017] | |
Lecture 14 | Matrix Multiplication in MapReduce | Postscript | [March 5, 2017] | |
Lecture 15 | MapReduce for Top K Problem | Postscript | [March 10, 2017] |
JSON home page | json.org |
JSON specification | ECMA-404: The JSON Data Interchange Format (PDF) |
org.json Javadocs | Javadoc |