Lab Activity

Famous Names

Overview - Frequency Analysis of Famous Names

Using an online database, the instructor has created a file of nearly 34,000 names of famous people.  Here's a brief excerpt:

Linus Torvalds
Arturo Toscanini
Peter Tosh
Nina Totenberg
Blanche M. Touhill
Marie J. Toulantis
Henri de Toulouse-Lautrec

Let's imagine someone is curious to know what the most common first names are. 

We want a program that can read a text file of names and analyze it to determine the frequency of occurrence of first names. 
Assume that there is only one name per line and that the person's first name starts at the beginning of the line and ends at the first blank. Valid first names contain only alphabetic characters and have more than one letter. Ignore invalid first names such as J'Nae. Disregard differences between lower and upper case letters.

The program should produce two kinds of reports: one with the names sorted alphabetically and one sorted by the number of times each name was found in the text, ordered by frequency, then by upper case name (alphabetically).  Each line contains the name and the count.  The kind of report to generate is specified by a command line argument.

Given this data file: samplenames.txt    the sample execution follows:

java NameFrequencyReport -F < samplenames.txt

BOB 3  
MARY 3
TOM 3
ED 2
JILL 1
MARK 1
PRINCE 1
REVEREND 1
java NameFrequencyReport < samplenames.txt
BOB 3  
ED 2
JILL 1 MARK 1 MARY 3
PRINCE 1
REVEREND 1 TOM 3
Note that even though "REVEREND" is a title, because it appears at the start of the line we will treat it as a first name.

The application should obtain the input text from standard input.  (Do not print any prompt to the user). 


Software Design

The design has been created for you and the class skeletons provided here: skeletons.zip
You are to implement the bodies of these skeletons.

Testing

You must write a JUnit test class for each class except the main class.

If you are curious to run your program with the entire list of 34,000 names, famousnames.txt is available (516KB)

Submission

Submit a zip file of your BlueJ project folder.



FAQ

Q: In the NameCount class, are you sure that we are not to have a getName() method? Otherwise, we would have to do string parsing when we're trying to implement the compareTo() method.
A: There is no getName() method and you don't have to do any string parsing. I suspect you are forgetting that compareTo() has access to all of an object's fields. Perhaps this reference will help.

Q: Does the number returned by the countNames() method of Document include duplicates or not?
A: Yes, a legal name is counted every time it appears.

Q: Is Collections.sort() a stable sort ?
A: Yes.