CPE 103 Hashing Lab
Read all directions before your start.
Part 1
Step 1 gives you some test data for this activity, as well as
practice with Quadratic Probing.
1. Hash the following keys into a table of size 21 using a hash
function of h(x) = x mod 21. Resolve collisions with quadratic
probing.
7, 22, 44, 43, 27, 89, 30, 64, 85
Draw a diagram of the completed table.
For this activity we are going to study Weiss's implementation of a
hashtable.
The following files are needed.
Hashable.java:
Hashable interface
HashEntry.java:
Implementation for quadratic probing hash table (entry class)
QuadraticProbingHashTable.java:
Implementation for quadratic
probing hash table
Comparable.java:
Comparable interface for pre-1.2 Java
MyInteger.java:
Integer interface for pre-1.2 Java
You may remove the "package DataStructures;" statement at
the
top of each file.
2. Make sure it compiles and executes correctly.
3. Add a new accessor method,
public HashEntry[] getTable()
that returns a copy of the underlying
array for us to examine.
4. Write a JUnit test to demonstrate that the program is working
correctly. That is, we want to verify that items inserted
into the hash table are actually placed in the correct spot in the
array according to the
Quadratic Probing technique.
Create a JUnit test that inserts the numbers from the exercise
above into Weiss's table. Then invoke the getTable()
method to obtain
the completed table. Write nine assert statements to confirm that
each number was placed
in the proper location in the table.
5. Convert QuadraticProbingHashTable.java to meet
the
class coding standards (except the main()
method).
- Always use braces, even if the block has a only single
statement.
- Use while statements for conditional loops, not for.
- Convert ternary expressions to if statements.
6. Add a comment to every logic construct that
explains
what it does, in terms of the abstraction,
not in terms of Java operations.
For example,
for( int i = 0; i < key.length( ); i++ )
Don't say "Loops from zero to key length",
Say "Examine every letter in the key."
The goal is that you should thoroughly understand how the
implementation operates and accomplishes its tasks.
7. Run your JUnit tests and verify they still pass. If
not, correct the defects.
Part 2
Warning: These instruction do not tell you everything you need to
know to solve this problem. Some details are intentionally
omitted; you have to figure these out on your own. Tip: Don't
reinvent the wheel. You don't need to write your own hash function.
We will be doing an experiment to determine if the nature of the
input data effects the performance of a hashtable. Begin by
modifying the author's Quadratic Probing hashtable class from the
previous lab so that it will count the number of probes that are
performed to insert an item in the hashtable. An item inserted with
no collisions count as one probe. Each spot that collides counts as
one probe.
Manually compute the number of probes that should result from the
numeric data in exercise 1 above. Run the tests to
verify your probe counter is working properly.
Draw a diagram of the hashtable of size 23 that would result from
inserting these Strings in the following order: A, W, X, Y, CM, CK,
BD, H, MOB, BD. Manually compute the number of probes that should
result. (Tip: You can use the BlueJ IDE to assist with this).
(Tip: Don't invent your own String hash function. Use the one Weiss includes
in his class.)
Run your probe counter
with this data and verify that it is working properly.
Experiment:
Consider the effect of the following three data sets on the
performance of the hashtable:
- A common unix word dictionary, in alphabetical order:
http://www.robblackwell.org.uk/wp-content/words
Use the "wc" command to find out how many words it
contains.
- The text
file
containing "Huckleberry Finn". Modify the file in a text editor so it has
exactly the same number of words as the dictionary file.
- A common unix word dictionary, in random order:
(Tip: The unix sort command has a -R flag to randomly shuffle
the input).
Make a prediction based on your theoretical understanding of
hashing. Which would create the most collisions?
Why?
Now set the capacity of the table to be twice as large as the number
of words in the dictionary file. (Why does this matter?)
Write a driver to run the experiment, hashing all three files and
recording the total number of probes. The driver should NOT filter
duplicates from the input file. (If "the" appears twice, you should
call insert twice.) You might also print the number of
words processed, as a confirmation.
Calculate the average number of probes by dividing the total by the
number of words.
Do the results match your prediction? Is the average about the same
for all input types? If not, which is longer? Explain why.
Submission
Create a document containing your predictions, drawings,
exercise results, data collection, analysis, and conclusions.
(The drawings may be neatly hand illustrated).
Attach your source code and test results (unit and system tests).
References