CPE 103 Hashing Lab 2

Make careful notes of your complete experiment in your lab notebook.
Warning: These instruction do not tell you everything you need to know to solve this problem. Some details are intentionally omitted; you have to figure these out on your own. Ask the instructor if you get stuck, not other student teams.

Goal: We want to compare the performance of a hashtable when the input is evenly distributed versus when it is random. Make a prediction based on your theoretical understanding of hashing, and explain your prediction in your notebook.

For an example of evenly distributed input, here is a common unix word dictionary, in alphabetical order:
http://www.robblackwell.org.uk/wp-content/words
Use the "wc" command to find out how many words it contains.

For an example of randomly distributed, you can use the text file containing "Huckleberry Finn". Then edit the file so it has exactly the same number of words as the dictionary file.

Modify the author's Quadratic Probing hashtable class from the previous lab so that it will count the number of probes that are performed to insert a word in the hashtable. A word inserted with no collisions count as one probe. Each spot that collides counts as one probe. Manually compute the number of probes that should result from the data in exercise 1 of the previous lab. Run the tests to verify your probe counter is working properly.

Set the capacity to be twice as large as the number of words in the dictionary file.   (Why does this matter?)

Write a driver to run the experiment, hashing both files and recording the total number of probes. The driver should NOT filter duplicates from the input file. You might also print the number of words processed, as a confirmation.

Calculate the average number of probes by dividing the total by the number of words.
Do the results match your prediction? Is the average the same for both input types, evenly distributed and random.  If not, which is longer? Explain why.

Show your notebook with your data and conclusions to the instructor.