Lab Activity
Famous Names
Overview - Frequency Analysis of Famous Names
Using an online database, the instructor has created a file of
nearly 34,000 names of famous people. Here's a brief excerpt:
Linus Torvalds
Arturo Toscanini
Peter Tosh
Nina Totenberg
Blanche M. Touhill
Marie J. Toulantis
Henri de Toulouse-Lautrec
Let's imagine someone is curious to know what the most common first
names are.
We want a program that can read a text file of names and analyze it
to determine the frequency of occurrence of first names.
Assume that there is only one name per line and that the person's
first name starts at the beginning of the line and ends at the first
blank. Valid first names contain only alphabetic characters and have
more than one letter. Ignore invalid first names such as J'Nae.
Disregard differences between lower and upper case letters.
The program should produce two kinds of reports: one with the names
sorted alphabetically and one sorted by the number of times each
name was found in the text, ordered by frequency, then by upper case
name (alphabetically). Each line contains the name and the
count. The kind of report to generate is specified by a
command line argument.
Given this data file: samplenames.txt
the sample execution follows:
java NameFrequencyReport -F < samplenames.txt
BOB 3
MARY 3
TOM 3
ED 2
JILL 1
MARK 1
PRINCE 1
REVEREND 1
java NameFrequencyReport < samplenames.txt
BOB 3
ED 2
JILL 1
MARK 1
MARY 3
PRINCE 1
REVEREND 1
TOM 3
Note that even though "REVEREND" is a title, because it appears at
the start of the line we will treat it as a first name.
The application should obtain the input text from standard
input. (Do not print any prompt to the user).
Software Design
The design has been created for you and the class skeletons provided
here:
skeletons.zip
You are to implement the bodies of these skeletons.
Testing
You must write a JUnit test class for each class except the main
class.
If you are curious to run your program with the entire list of
34,000 names, famousnames.txt
is available (516KB)
Submission
Submit a zip file of your BlueJ project folder.
FAQ
Q: In the NameCount class, are you sure that we are not to have a
getName() method? Otherwise, we would have to do string parsing when
we're trying to implement the compareTo() method.
A: There is no getName() method and you don't have to do any string
parsing. I suspect you are forgetting that compareTo() has access to
all of an object's fields. Perhaps this reference
will help.
Q: Does the number returned by the countNames() method of Document
include duplicates or not?
A: Yes, a legal name is counted every time it appears.
Q: Is Collections.sort() a stable sort
?
A: Yes.