Lab 3, CSC/CPE 203 - collect data

Orientation

For this lab you will implement a program that mimics data analysis for an e-commerce site. You will implement a set of classes (as determined by you) to represent the data stored in a "log" file. You will then work with the data to compute various statistics on the collected data.

Objectives

Given Files

Retrieve the files provided for this lab here:

These given files include base code to help in the task of analyzing log data. Your final program will need to read in e-commerce like data from a file and compute statistics based on the information in the log. The base code implements the file reading (but not the processing of the data read). You will need to complete the processing of the data for this lab, (including putting the data into appropriate data structures, i.e., not an array of strings). You are encouraged to read through the provided code.

Log File

Examine the provided log files. For example, look at the contents of small.log. You will find various entries representing customer use of an imaginary e-commerce site. Each entry will appear on a single line in the file and consist of an entry type tag followed by the corresponding entry attributes. The different types of entries and their attributes are as follows.

Each sessionId is a unique identifier represented as a String, each customerId is a unique identifier represented as a String, each productId is a unique identifier represented as a String, each price is an integer number of cents, and each quantity is an integer.

Task

Your program will take the name of a log file as a command-line argument. It must then read the file and store the entries for processing. Your program will then output the results of computing the statistics discussed below.

You must determine how to represent the entries and which data structures to use for processing the data. Look at the provided code, there are several clues. Though you might be tempted to simplify the data (since not all of it is used in the required analysis), avoid that temptation for now. Define classes, as appropriate, to represent the entries (give consideration for the cohesion and coupling design principles).

You should note that within the provided source file there is an example of such processing and the use of a data structure (a map) to store some of the information. Based on the material discussed thus far, this data structure is used by first creating it and then passing it to another method to be populated. This then allows for the use of the populated data structure after the file processing. Some of you may not care for this mutation-based style, which is fine, but this is the most direct approach at this time.

Your program is only expected to properly analyze logically organized log files. In particular, for a given sessionId, all VIEW and BUY entries will come after a START entry and come before, if present, an END entry (it is conceivable that a session may not have ended when the snapshot of the log is considered).

Statistics

Your program must compute the following statistics.

The output of your program when run on the provided small.log file should be as follows (the order within each breakdown is not significant).

Average Views without Purchase: 3.0

Price Difference for Purchased Product by Session
session4
   product1 -100.0
session2
   product1 -25.0
   product3 175.0

Number of Views for Purchased Product by Customer
customer2
   product1 1
customer1
   product1 2
   product3 2

How to tackle this lab...

This is a longer lab and requires you to solve some data processing challenges on your own. You might want to consider these steps.

Submission

This lab is due by the end of lab in 1.5 weeks (either Tuesday 10/10 or Wednesday 10/11, depending on your section). Be sure to commit the code by the due date. Your instructor reserves the right to run further tests on your code. Demonstrate your working program to your instructor. Be prepared to show your source code. (Partial credit available up to instrutor discretion).