Introduction to Hashing

Suppose we want to track contributions of six friends to club dues.

Alex, Doug, Frank, Julie, Susan, Wendy

Their contributions come at unpredictable times, so the input looks like this:

Doug  5
Wendy  3
Alex  10
Doug  2
Frank  5
etc

The bonehead approach would be to create a List and do a linear search to lookup each name. The smart approach is to use a Map. Then looking up a name is simply a call to get(). How does the Map actually work?

In the Collections framework, HashMap is an implementation of the Map interface that uses hashing.

Perhaps the most obvious hashing technique is to take the first letter of the name and convert it to an integer. Assuming it's an ASCII character, it will an integer between 0 and 255. So we can then use the integer as an index into a 256-element array.

The content of each element is the total dues paid.

It's true we won't be utilizing all 256 elements of the array, so if that bothers you, subtract 'A' before converting, then it fits into a 26-element array.


$10$7$5
01234567891011121314
ABCDEFGHIJKLMNO


View Implementation

If the friends happened to be named
Alan, Betty, Cory, Dan, Ernie, Fay
we could fit them into a 6 element table because the first letters of the names conveniently are one of the first six letters of the alphabet.

But what if the friends were named Alfred, Alessia, Amina, Amy, Andy, Anne? Now the first letters are all the same.







Answer: Use the 3rd letter.
But then we're back to a 26-element table.
Is there a way to fit them into a 6-element table?





Key3rd letteras integermodulo 6
Alfredf60
Alessiae55
Aminai93
Amyy251
Andyd44
Anneh142

So the trick is to modulo by the table size. This is a very common hashing technique.

Note:
A lookup table can find an item in constant time (much faster than linear time).
However, the table can only be indexed by an integer value.
the table can hold only a single element.

A Hash function is a method to transform a key so that it can be used in a lookup table.

Step 1:  Transform the key into an integer
Step 2:  Fit the integer into the range of valid indices for your desired table size.

Usually "fitting" means modulo table size.

Example transformations:
Mapping - arithmetic transformation into integer
Folding - partition the key, then combine (e.g. add or multiply)
Shifting - partition and move left or right
Casting - e.g. character to integer


If Alan joins the previous club, is third letter is 'a', which "collides" with Amy.

A hash function which can map each key to a unique spot in the table is
a "perfect" hashing function.

A perfect hashing function which uses a table whose size equals the number of keys
is a "minimal perfect" hashing function.

Non-perfect hashing functions must incorporate a collision resolution strategy.

If you have a small set of keys, it's worth spending a little time to see if you
might invent a perfect or near-perfect hashing function.

Otherwise, just use a general Hash ADT from a library, such as Java's HashMap. It has collision-resolution built in so you don't have to deal with all that messy stuff.