Ideally, the hash function will assign each key to a unique bucket, but this situation is rarely achievable in practice (usually some keys will hash to the same bucket). Instead, most hash table designs assume that hash collisions, different keys that are assigned by the hash function to the same bucket, will occur and must be accommodated in some way (Later we use Linked List data structure, that was discussed in this blog, to remedy this potential problem).
You can find an informative video discussing Hashtables by Paul Programming here.
Basic Hashtable Operations
The Hashtable element is represented in HashtableNode private class and Hashtable functionalities are represented in Hashtable class. This class is generic.
import uk.ac.soton.ecs.datastructure.linkedlist.LinkedList;
public class Hashtable<K, V> {
private class HashtableNode {
private final K key;
private V value;
public HashtableNode(K key, V value) {
this.key = key;
this.value = value;
}
public V getValue() {
return value;
}
public void setValue(V value) {
this.value = value;
}
public K getKey() {
return key;
}
@Override
public boolean equals(Object o) {
if (o == null || this.getClass() != o.getClass()) {
return false;
}
HashtableNode node = (HashtableNode) o;
return this.key.equals(node.getKey());
}
@Override
public String toString() {
return "Key: [" + this.key + "] data: [" + this.value + "]";
}
}
private final int DEFAULT_CAPACITY = 20;
private int size;
private Object[] table;
public Hashtable() {
this.size = 0;
this.table = new Object[DEFAULT_CAPACITY];
}
}
Note that we use LinkedList implementation that we discussed here.
Hash function
We will use Java's toString() method that exists for any Object. The hash method adds up the ASCII values of the characters in the string. For characters between a and f, we use the hex value instead. Once we've got a number, we find the modulus of the number with the table size so that we can fit into the backing array. // Hash function
private int hash(K key) {
// Start with a base, just so that it's not 0 for empty strings
int result = 42;
// Convert key to array of characters
String inputString = key.toString().toLowerCase();
char[] characters = inputString.toCharArray();
for (int i = 0; i < characters.length; i++) {
char currentChar = characters[i];
// hexadecimal numbers are base 16
// the hexadecimal "digits" are 0123456789ABCDEF
if (currentChar == 'a' || currentChar == 'b' || currentChar == 'c'
|| currentChar == 'd' || currentChar == 'e'
|| currentChar == 'f') {
result += Integer.parseInt("" + currentChar, 16);
}
int j = (int) currentChar;
result += j;
}
return (result % this.DEFAULT_CAPACITY);
}
Put
To put an element to the hashtable, we find the position in the backing array using the hash() method on the key. Then, if there is a linked list already in that position, we append to it by creating a new HashtableNode object and setting the key and data provided and adding that to the list. If there is no linked list in that position, we create one and then add the HashtableNode to it.
// Put an element into the Hashtable
public void put(K key, V value) {
// Don't add null key or data
if (key == null || value == null) {
return;
}
// Don't add duplicate keys
if (this.get(key) != null) {
return;
}
// Find out where in our array should the item go
int position = this.hash(key);
// If nothing exists in the position, create a new linked list there
if (this.table[position] == null) {
this.table[position] = new LinkedList<HashtableNode>();
}
// Add to the linked list in the appropriate position
LinkedList<HashtableNode> ll = (LinkedList<HashtableNode>) this.table[position];
HashtableNode node = new HashtableNode(key, value);
ll.addLast(node);
this.size++;
}
Get
// Get an element from the Hashtable
public V get(K key) {
if (key != null) {
int position = this.hash(key);
if (this.table[position] != null) {
LinkedList<HashtableNode> ll = (LinkedList<HashtableNode>) this.table[position];
Object[] nodes = ll.iterator();
for (Object node : nodes) {
HashtableNode hashtableNode = (HashtableNode) node;
if (hashtableNode.getKey().equals(key)) {
return hashtableNode.getValue();
}
}
}
}
return null;
}
Remove
To remove an element from the hashtable, we find the position in the backing array using the hash() method on the key provided. Then, if there is a linked list already in that position, we remove the element from it. If there is no linked list in that position, the element does not exist in the hash table so we do nothing.
// Remove a key from the Hashtable
public boolean remove(K key) {
if (key != null) {
int position = this.hash(key);
if (this.table[position] != null) {
LinkedList<HashtableNode> ll = (LinkedList<HashtableNode>) this.table[position];
HashtableNode deletedNode = new HashtableNode(key,
this.get(key));
boolean result = ll.remove(deletedNode);
return result;
}
}
return false;
}
Evaluation
Complexity Analysis
Functionality | Complexity |
---|---|
Search | Average: O(1) Worst Case: O(n) |
Insert | Average: O(1) Worst Case: O(n) |
Delete | Average: O(1) Worst Case: O(n) |
Advantages
The main advantage of hash tables over other table data structures is speed. This advantage is more apparent when the number of entries is large. Hash tables are particularly efficient when the maximum number of entries can be predicted in advance, so that the bucket array can be allocated once with the optimum size and never resized.
Disadvantages
Although operations on a hash table take constant time on average, the cost of a good hash function can be significantly higher than the inner loop of the lookup algorithm for a sequential list or search tree. Thus hash tables are not effective when the number of entries is very small.
Summary
Here goes a well-written explanation of Hashtables by Lasse V. Karlsen:
Let's assume you want to fill up a library of books, and not just stuff them in there, but you want to be able to easily find them again when you need them.
So, you decide that if the person that wants to read a book knows the title of the book, and the exact title to boot, then that's all it should take. With the title, the person, with the aid of the librarian, should be able to go find the book easily and quickly.
So, how can you do that? Well, obviously you can keep some kind of list of where you put each book, but then you have the same problem as searching the library, you need to search the list. Granted, the list would be smallers, and easier to search, but still, you don't want to search sequentially from one end of the library (or list) to the other.
You want something that, with the title of the book, can give you the right spot at once, so all you have to do is just stroll over to the right shelf, and pick up the book.
But how can that be done? Well, with a bit of forethought when you fill up the library, and actually, a lot of work when you fill up the library.
Instead of just starting to fill up the library from one end to the other, you devise a clever little method. You take the title of the book, run it through a small computer program, which spits out a shelf number and a slot number on that shelf. This is where you place the book.
The beauty of this program is that later on, when a person comes back in to read the book, you feed the title through the program once more, and get back the same shelf number and slot number that you were originally given, and this is where the book is located.
The program, as others have already mentioned, is called a hash algorithm or hash computation, and usually works by taking the data fed into it (the title of the book in this case) and calculates a number from it.
For simplicity, let's say that it just converts each letter and symbol into a number, and sums them all up. In reality it's a lot more complicated than that, but let's leave it at that for now.
The beauty of such an algorithm is that if you feed the same input into it again and again, it will keep spitting out the same number each time.
Ok, so that's basically how a hash table works.
Technical stuff follows.
First, there's the size of the number. Usually, the output of such a hash algorithm is inside a range of some large number, typically much much larger than the space you have in your table. For instance, let's say that we have room for exactly one million books in the library. The output of the hash calculation could be in the range of 0 to one billion, a lot higher.
So what do we do? We use something called modulus calculation, which basically says that if you counted to the number you wanted (ie. the one billion number), but wanted to stay inside a much smaller range, each time you hit the limit of that smaller range, you started back at 0, but you have to keep track of how far in the big sequence you've come.
Say that the output of the hash algorithm is in the range of 0 to 20, and you get the value 17 from a particular title. If the size of the library is only 7 books, you count 0, 1, 2, 3, 4, 5, 6, and when you get to 7, you start back at 0. Since we need to count 17 times, we have 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, and the final number is 3.
Of course modulus calculation isn't done like that, it's done with division, and remainder. The remainder of dividing 17 by 7 is 3 (7 goes 2 times into 17, to 14, and the difference between 17 and 14 is 3).
Thus, you put the book in slot nr. 3.
This leads to the next problem. Collisions. Since the algorithm has no way to space out the books so that they fill the library exactly (or the hash table if you will), it will invariably end up calculating a number that has been used before. In the library sense, when you get to the shelf and the slot number you wish to put a book in, there's already a book there.
Various collision handling methods exist, including running the data into yet another calculation to get another spot in the table, or simply to find a space close to the one you were given (i.e. right next to the previous book). This would mean that you have some digging to do when you try to find the book later, but it's still better than simply starting at one end of the library.
Finally, at some point, you might want to put more books into the library than the library allows, in other words, you need to build a bigger library. Since the exact spot in the library was calculated using the exact, and current, size of the library, it goes to follow that if you resize the library, you might end up having to find new spots for all the books, since the calculation done to find their spots has changed.
Source Code
import uk.ac.soton.ecs.datastructure.linkedlist.LinkedList;
public class Hashtable<K, V> {
private class HashtableNode {
private final K key;
private V value;
public HashtableNode(K key, V value) {
this.key = key;
this.value = value;
}
public V getValue() {
return value;
}
public void setValue(V value) {
this.value = value;
}
public K getKey() {
return key;
}
@Override
public boolean equals(Object o) {
if (o == null || this.getClass() != o.getClass()) {
return false;
}
HashtableNode node = (HashtableNode) o;
return this.key.equals(node.getKey());
}
@Override
public String toString() {
return "Key: [" + this.key + "] data: [" + this.value + "]";
}
}
private final int DEFAULT_CAPACITY = 20;
private int size;
private Object[] table;
public Hashtable() {
this.size = 0;
this.table = new Object[DEFAULT_CAPACITY];
}
// Hash function
private int hash(K key) {
// Start with a base, just so that it's not 0 for empty strings
int result = 42;
// Convert key to array of characters
String inputString = key.toString().toLowerCase();
char[] characters = inputString.toCharArray();
for (int i = 0; i < characters.length; i++) {
char currentChar = characters[i];
// hexadecimal numbers are base 16
// the hexadecimal "digits" are 0123456789ABCDEF
if (currentChar == 'a' || currentChar == 'b' || currentChar == 'c'
|| currentChar == 'd' || currentChar == 'e'
|| currentChar == 'f') {
result += Integer.parseInt("" + currentChar, 16);
}
int j = (int) currentChar;
result += j;
}
return (result % this.DEFAULT_CAPACITY);
}
// Put an element into the Hashtable
public void put(K key, V value) {
// Don't add null key or data
if (key == null || value == null) {
return;
}
// Don't add duplicate keys
if (this.get(key) != null) {
return;
}
// Find out where in our array should the item go
int position = this.hash(key);
// If nothing exists in the position, create a new linked list there
if (this.table[position] == null) {
this.table[position] = new LinkedList<HashtableNode>();
}
// Add to the linked list in the appropriate position
LinkedList<HashtableNode> ll = (LinkedList<HashtableNode>) this.table[position];
HashtableNode node = new HashtableNode(key, value);
ll.addLast(node);
this.size++;
}
// Get an element from the Hashtable
public V get(K key) {
if (key != null) {
int position = this.hash(key);
if (this.table[position] != null) {
LinkedList<HashtableNode> ll = (LinkedList<HashtableNode>) this.table[position];
Object[] nodes = ll.iterator();
for (Object node : nodes) {
HashtableNode hashtableNode = (HashtableNode) node;
if (hashtableNode.getKey().equals(key)) {
return hashtableNode.getValue();
}
}
}
}
return null;
}
// Remove a key from the Hashtable
public boolean remove(K key) {
if (key != null) {
int position = this.hash(key);
if (this.table[position] != null) {
LinkedList<HashtableNode> ll = (LinkedList<HashtableNode>) this.table[position];
HashtableNode deletedNode = new HashtableNode(key,
this.get(key));
boolean result = ll.remove(deletedNode);
return result;
}
}
return false;
}
@Override
public String toString() {
StringBuffer buffer = new StringBuffer();
buffer.append(System.getProperty("line.separator"));
buffer.append("{");
buffer.append(System.getProperty("line.separator"));
for (int i = 0; i < this.table.length; i++) {
if (this.table[i] != null) {
buffer.append("\t"
+ ((LinkedList<HashtableNode>) this.table[i])
.toString());
buffer.append(System.getProperty("line.separator"));
}
}
buffer.append("}");
return buffer.toString();
}
public int getSize() {
return this.size;
}
}
JUnit Test
import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;
import org.junit.Test;
public class HashtableTest {
@Test
public void testHashtable() {
Hashtable<String, String> hashtable = new Hashtable<String, String>();
assertTrue(0 == hashtable.getSize());
}
@Test
public void testPut() {
Hashtable<String, String> hashtable = new Hashtable<String, String>();
hashtable.put("Number1", "One");
hashtable.put("Number2", "Two");
assertTrue("One".equals(hashtable.get("Number1")));
assertTrue("Two".equals(hashtable.get("Number2")));
}
@Test
public void testGet() {
Hashtable<String, String> hashtable = new Hashtable<String, String>();
hashtable.put("Number1", "One");
hashtable.put("Number2", "Two");
assertTrue("One".equals(hashtable.get("Number1")));
assertFalse("One".equals(hashtable.get("Number2")));
assertTrue("Two".equals(hashtable.get("Number2")));
}
@Test
public void testRemove() {
Hashtable<String, String> hashtable = new Hashtable<String, String>();
hashtable.put("Number1", "One");
hashtable.put("Number2", "Two");
hashtable.put("Number3", "Three");
assertTrue(hashtable.remove("Number2"));
assertFalse(hashtable.remove("Number4"));
}
@Test
public void testToString() {
Hashtable<String, String> hashtable = new Hashtable<String, String>();
hashtable.put("Number1", "One");
hashtable.put("Number2", "Two");
assertTrue(hashtable.toString().contains("Number1"));
}
@Test
public void testGetSize() {
Hashtable<String, String> hashtable = new Hashtable<String, String>();
hashtable.put("Number1", "One");
hashtable.put("Number2", "Two");
assertTrue(2 == hashtable.getSize());
}
}
No comments:
Post a Comment