CSc 103 Programming Project

Famous Names

The goal of this project is to write a Java implementation of a generic Binary Search Tree and write a client application that uses it.

Overview - Frequency Analysis of Famous Names

Using an online database, the instructor has created a file of nearly 34,000 names of famous people.  Of course you are probably curious what the most common first names are. 

We want a program that can read a text file of names and analyze it to determine the frequency of occurrence of first names.  Assume that there is only one name per line and that the person's first name starts at the beginning of the line. Disregard differences between lower and upper case letters.  Valid names contain only alphabetic characters and have more than one letter. (Blanks and punctuation are not considered as letters).  The output report should show the number of times each name was found in the text, ordered by frequency, then by upper case name (alphabetically).  Each line contains the name, the count, and the percent of the total number of names (to 2 decimal places).

For example, an analysis of this data file would yield:
BOB 3  20.00
MARY 3 20.00
TOM 3 20.00
ED 2 13.33
JILL 1 6.67
MARK 1 6.67
PRINCE 1 6.67
REVEREND 1 6.67

Note that even though "REVEREND" is a title, because it appears at the start of the line we will treat it as a first name.

The application should obtain the input text from standard input.  (Do not print any prompt to the user). 

Software Design

Click on a class in the diagram to view the javadocs.

Class Diagram


You are to implement all the classes in the above design, except for FamousNameApp, BinaryNode and Natural.

BinarySearchTree   This is a generic version of StringSearchTree that we developed in lab.
NameCount, an ADT for a name count.
NameCountComparator, implements java.util.Comparator<NameCount>, to enable sorting on frequency.
FamousName, an ADT for a special kind of string that represents a famous name.
Document, process a stream of english text to count the famous names.
NameFrequencyReport, generate the finished report.

Here are the .class files for the instructor components in a JAR formatted file:
   FamousNameComponents.jar
   FamousNameComponents.jar (Ver 2)

For your convenience, skeletons.zip contains the skeletons for the files you need to implement.

Testing

You must write your own JUnit tests.

Your components will be tested with instructor reference tests.  The reference tests are JUnit tests of your classes (shown in green in the class diagram above).  There will be a performance test: the 34,000 names from the online database. There's a time limit of 20 seconds.

On this project your solution must conform to the class coding standard. Your unit tests must demonstrate "statement" coverage (they cause every statement in your program to be executed.)


Handing in Your Source Electronically