Collision resolution by chaining (closed addressing) Chaining is a possible way to resolve collisions. Each slot of the array contains a link to a singly-linked list containing key-value pairs with the same hash. New key-value pairs are added to the end of the list. Preventing Double Events in a Time Slot. In many use cases you may need to limit the count of events per time slot. For example, you may need to deny creation of the 2nd event if some other event has already been defined at that time. Activating the monitoring for collisions. To control the number of events in a time slot, use the 'collision.
In previous sections we were able to make improvements in our searchalgorithms by taking advantage of information about where items arestored in the collection with respect to one another. For example, byknowing that a list was ordered, we could search in logarithmic timeusing a binary search. In this section we will attempt to go one stepfurther by building a data structure that can be searched in(O(1)) time. This concept is referred to as hashing.
In order to do this, we will need to know even more about where theitems might be when we go to look for them in the collection. If everyitem is where it should be, then the search can use a single comparisonto discover the presence of an item. We will see, however, that this istypically not the case.
A hash table is a collection of items which are stored in such a wayas to make it easy to find them later. Each position of the hash table,often called a slot, can hold an item and is named by an integervalue starting at 0. For example, we will have a slot named 0, a slotnamed 1, a slot named 2, and so on. Initially, the hash table containsno items so every slot is empty. We can implement a hash table by usinga list with each element initialized to the special Python valueNone
. Figure 4 shows a hash table of size (m=11).In other words, there are m slots in the table, named 0 through 10.
The mapping between an item and the slot where that item belongs in thehash table is called the hash function. The hash function will takeany item in the collection and return an integer in the range of slotnames, between 0 and m-1. Assume that we have the set of integer items54, 26, 93, 17, 77, and 31. Our first hash function, sometimes referredto as the 'remainder method,' simply takes an item and divides it by thetable size, returning the remainder as its hash value((h(item)=item % 11)). Table 4 gives all of thehash values for our example items. Note that this remainder method(modulo arithmetic) will typically be present in some form in all hashfunctions, since the result must be in the range of slot names.
Item | Hash Value |
---|---|
54 | 10 |
26 | 4 |
93 | 5 |
17 | 6 |
77 | 0 |
31 | 9 |
Once the hash values have been computed, we can insert each item intothe hash table at the designated position as shown inFigure 5. Note that 6 of the 11 slots are now occupied. Thisis referred to as the load factor, and is commonly denoted by(lambda = frac {numberofitems}{tablesize}). For this example,(lambda = frac {6}{11}).
Figure 5: Hash Table with Six Items¶
Now when we want to search for an item, we simply use the hash functionto compute the slot name for the item and then check the hash table tosee if it is present. This searching operation is (O(1)), sincea constant amount of time is required to compute the hash value and thenindex the hash table at that location. If everything is where it shouldbe, we have found a constant time search algorithm.
You can probably already see that this technique is going to work onlyif each item maps to a unique location in the hash table. For example,if the item 44 had been the next item in our collection, it would have ahash value of 0 ((44 % 11 0)). Since 77 also had a hashvalue of 0, we would have a problem. According to the hash function, twoor more items would need to be in the same slot. This is referred to asa collision (it may also be called a 'clash'). Clearly, collisionscreate a problem for the hashing technique. We will discuss them indetail later.
6.5.1. Hash Functions¶
Given a collection of items, a hash function that maps each item into aunique slot is referred to as a perfect hash function. If we knowthe items and the collection will never change, then it is possible toconstruct a perfect hash function (refer to the exercises for more aboutperfect hash functions). Unfortunately, given an arbitrary collection ofitems, there is no systematic way to construct a perfect hash function.Luckily, we do not need the hash function to be perfect to still gainperformance efficiency.
One way to always have a perfect hash function is to increase the sizeof the hash table so that each possible value in the item range can beaccommodated. This guarantees that each item will have a unique slot.Although this is practical for small numbers of items, it is notfeasible when the number of possible items is large. For example, if theitems were nine-digit Social Security numbers, this method would requirealmost one billion slots. If we only want to store data for a class of25 students, we will be wasting an enormous amount of memory.
Our goal is to create a hash function that minimizes the number ofcollisions, is easy to compute, and evenly distributes the items in thehash table. There are a number of common ways to extend the simpleremainder method. We will consider a few of them here.
The folding method for constructing hash functions begins bydividing the item into equal-size pieces (the last piece may not be ofequal size). These pieces are then added together to give the resultinghash value. For example, if our item was the phone number 436-555-4601,we would take the digits and divide them into groups of 2(43,65,55,46,01). After the addition, (43+65+55+46+01), we get210. If we assume our hash table has 11 slots, then we need to performthe extra step of dividing by 11 and keeping the remainder. In this case(210 % 11) is 1, so the phone number 436-555-4601 hashes toslot 1. Some folding methods go one step further and reverse every otherpiece before the addition. For the above example, we get(43+56+55+64+01 = 219) which gives (219 % 11 = 10).
Another numerical technique for constructing a hash function is calledthe mid-square method. We first square the item, and then extractsome portion of the resulting digits. For example, if the item were 44,we would first compute (44 ^{2} = 1,936). By extracting themiddle two digits, 93, and performing the remainder step, we get 5((93 % 11)). Table 5 shows items under both theremainder method and the mid-square method. You should verify that youunderstand how these values were computed.
Item | Remainder | Mid-Square |
---|---|---|
54 | 10 | Ak chin casino restaurants. 3 |
26 | 4 | 7 |
93 | 5 | 9 |
17 | 6 | 8 |
77 | 0 | 4 |
31 | 9 | 6 |
We can also create hash functions for character-based items such asstrings. The word 'cat' can be thought of as a sequence of ordinalvalues.
We can then take these three ordinal values, add them up, and use theremainder method to get a hash value (see Figure 6).Listing 1 shows a function called hash
that takes astring and a table size and returns the hash value in the range from 0to tablesize
-1.
Figure 6: Hashing a String Using Ordinal Values¶
The mapping between an item and the slot where that item belongs in thehash table is called the hash function. The hash function will takeany item in the collection and return an integer in the range of slotnames, between 0 and m-1. Assume that we have the set of integer items54, 26, 93, 17, 77, and 31. Our first hash function, sometimes referredto as the 'remainder method,' simply takes an item and divides it by thetable size, returning the remainder as its hash value((h(item)=item % 11)). Table 4 gives all of thehash values for our example items. Note that this remainder method(modulo arithmetic) will typically be present in some form in all hashfunctions, since the result must be in the range of slot names.
Item | Hash Value |
---|---|
54 | 10 |
26 | 4 |
93 | 5 |
17 | 6 |
77 | 0 |
31 | 9 |
Once the hash values have been computed, we can insert each item intothe hash table at the designated position as shown inFigure 5. Note that 6 of the 11 slots are now occupied. Thisis referred to as the load factor, and is commonly denoted by(lambda = frac {numberofitems}{tablesize}). For this example,(lambda = frac {6}{11}).
Figure 5: Hash Table with Six Items¶
Now when we want to search for an item, we simply use the hash functionto compute the slot name for the item and then check the hash table tosee if it is present. This searching operation is (O(1)), sincea constant amount of time is required to compute the hash value and thenindex the hash table at that location. If everything is where it shouldbe, we have found a constant time search algorithm.
You can probably already see that this technique is going to work onlyif each item maps to a unique location in the hash table. For example,if the item 44 had been the next item in our collection, it would have ahash value of 0 ((44 % 11 0)). Since 77 also had a hashvalue of 0, we would have a problem. According to the hash function, twoor more items would need to be in the same slot. This is referred to asa collision (it may also be called a 'clash'). Clearly, collisionscreate a problem for the hashing technique. We will discuss them indetail later.
6.5.1. Hash Functions¶
Given a collection of items, a hash function that maps each item into aunique slot is referred to as a perfect hash function. If we knowthe items and the collection will never change, then it is possible toconstruct a perfect hash function (refer to the exercises for more aboutperfect hash functions). Unfortunately, given an arbitrary collection ofitems, there is no systematic way to construct a perfect hash function.Luckily, we do not need the hash function to be perfect to still gainperformance efficiency.
One way to always have a perfect hash function is to increase the sizeof the hash table so that each possible value in the item range can beaccommodated. This guarantees that each item will have a unique slot.Although this is practical for small numbers of items, it is notfeasible when the number of possible items is large. For example, if theitems were nine-digit Social Security numbers, this method would requirealmost one billion slots. If we only want to store data for a class of25 students, we will be wasting an enormous amount of memory.
Our goal is to create a hash function that minimizes the number ofcollisions, is easy to compute, and evenly distributes the items in thehash table. There are a number of common ways to extend the simpleremainder method. We will consider a few of them here.
The folding method for constructing hash functions begins bydividing the item into equal-size pieces (the last piece may not be ofequal size). These pieces are then added together to give the resultinghash value. For example, if our item was the phone number 436-555-4601,we would take the digits and divide them into groups of 2(43,65,55,46,01). After the addition, (43+65+55+46+01), we get210. If we assume our hash table has 11 slots, then we need to performthe extra step of dividing by 11 and keeping the remainder. In this case(210 % 11) is 1, so the phone number 436-555-4601 hashes toslot 1. Some folding methods go one step further and reverse every otherpiece before the addition. For the above example, we get(43+56+55+64+01 = 219) which gives (219 % 11 = 10).
Another numerical technique for constructing a hash function is calledthe mid-square method. We first square the item, and then extractsome portion of the resulting digits. For example, if the item were 44,we would first compute (44 ^{2} = 1,936). By extracting themiddle two digits, 93, and performing the remainder step, we get 5((93 % 11)). Table 5 shows items under both theremainder method and the mid-square method. You should verify that youunderstand how these values were computed.
Item | Remainder | Mid-Square |
---|---|---|
54 | 10 | Ak chin casino restaurants. 3 |
26 | 4 | 7 |
93 | 5 | 9 |
17 | 6 | 8 |
77 | 0 | 4 |
31 | 9 | 6 |
We can also create hash functions for character-based items such asstrings. The word 'cat' can be thought of as a sequence of ordinalvalues.
We can then take these three ordinal values, add them up, and use theremainder method to get a hash value (see Figure 6).Listing 1 shows a function called hash
that takes astring and a table size and returns the hash value in the range from 0to tablesize
-1.
Figure 6: Hashing a String Using Ordinal Values¶
Listing 1
It is interesting to note that when using this hash function, anagramswill always be given the same hash value. To remedy this, we could usethe position of the character as a weight. Figure 7 showsone possible way to use the positional value as a weighting factor. Themodification to the hash
function is left as an exercise.
Figure 7: Hashing a String Using Ordinal Values with Weighting¶
You may be able to think of a number of additional ways to compute hashvalues for items in a collection. The important thing to remember isthat the hash function has to be efficient so that it does not becomethe dominant part of the storage and search process. If the hashfunction is too complex, then it becomes more work to compute the slotname than it would be to simply do a basic sequential or binary searchas described earlier. This would quickly defeat the purpose of hashing.
6.5.2. Collision Resolution¶
We now return to the problem of collisions. When two items hash to thesame slot, we must have a systematic method for placing the second itemin the hash table. This process is called collision resolution. Aswe stated earlier, if the hash function is perfect, collisions willnever occur. However, since this is often not possible, collisionresolution becomes a very important part of hashing.
One method for resolving collisions looks into the hash table and triesto find another open slot to hold the item that caused the collision. Asimple way to do this is to start at the original hash value positionand then move in a sequential manner through the slots until weencounter the first slot that is empty. Note that we may need to go backto the first slot (circularly) to cover the entire hash table. Thiscollision resolution process is referred to as open addressing inthat it tries to find the next open slot or address in the hash table.By systematically visiting each slot one at a time, we are performing anopen addressing technique called linear probing.
Figure 8 shows an extended set of integer items under thesimple remainder method hash function (54,26,93,17,77,31,44,55,20).Table 4 above shows the hash values for the original items.Figure 5 shows the original contents. When we attempt toplace 44 into slot 0, a collision occurs. Under linear probing, we looksequentially, slot by slot, until we find an open position. In thiscase, we find slot 1.
Again, 55 should go in slot 0 but must be placed in slot 2 since it isthe next open position. The final value of 20 hashes to slot 9. Sinceslot 9 is full, we begin to do linear probing. We visit slots 10, 0, 1,and 2, and finally find an empty slot at position 3.
Figure 8: Collision Resolution with Linear Probing¶
Once we have built a hash table using open addressing and linearprobing, it is essential that we utilize the same methods to search foritems. Assume we want to look up the item 93. When we compute the hashvalue, we get 5. Looking in slot 5 reveals 93, and we can returnTrue
. What if we are looking for 20? Now the hash value is 9, andslot 9 is currently holding 31. We cannot simply return False
sincewe know that there could have been collisions. We are now forced to do asequential search, starting at position 10, looking until either we findthe item 20 or we find an empty slot.
A disadvantage to linear probing is the tendency for clustering;items become clustered in the table. This means that if many collisionsoccur at the same hash value, a number of surrounding slots will befilled by the linear probing resolution. This will have an impact onother items that are being inserted, as we saw when we tried to add theitem 20 above. A cluster of values hashing to 0 had to be skipped tofinally find an open position. This cluster is shown inFigure 9.
One way to deal with clustering is to extend the linear probingtechnique so that instead of looking sequentially for the next openslot, we skip slots, thereby more evenly distributing the items thathave caused collisions. This will potentially reduce the clustering thatoccurs. Figure 10 shows the items when collisionresolution is done with a 'plus 3' probe. This means that once acollision occurs, we will look at every third slot until we find onethat is empty.
Figure 10: Collision Resolution Using 'Plus 3'¶
The general name for this process of looking for another slot after acollision is rehashing. With simple linear probing, the rehashfunction is (newhashvalue = rehash(oldhashvalue)) where(rehash(pos) = (pos + 1) % sizeoftable). The 'plus 3' rehashcan be defined as (rehash(pos) = (pos+3) % sizeoftable). Ingeneral, (rehash(pos) = (pos + skip) % sizeoftable). It isimportant to note that the size of the 'skip' must be such that all theslots in the table will eventually be visited. Otherwise, part of thetable will be unused. To ensure this, it is often suggested that thetable size be a prime number. This is the reason we have been using 11in our examples.
A variation of the linear probing idea is called quadratic probing.Instead of using a constant 'skip' value, we use a rehash function thatincrements the hash value by 1, 3, 5, 7, 9, and so on. This means thatif the first hash value is h, the successive values are (h+1),(h+4), (h+9), (h+16), and so on. In general, the i will be i^2 (rehash(pos) = (h + i^2)). In other words,quadratic probing uses a skip consisting of successive perfect squares.Figure 11 shows our example values after they are placed usingthis technique.
Figure 11: Collision Resolution with Quadratic Probing¶
An alternative method for handling the collision problem is to alloweach slot to hold a reference to a collection (or chain) of items.Chaining allows many items to exist at the same location in the hashtable. When collisions happen, the item is still placed in the properslot of the hash table. As more and more items hash to the samelocation, the difficulty of searching for the item in the collectionincreases. Figure 12 shows the items as they are added to a hashtable that uses chaining to resolve collisions.
When we want to search for an item, we use the hash function to generatethe slot where it should reside. Since each slot holds a collection, weuse a searching technique to decide whether the item is present. Theadvantage is that on the average there are likely to be many fewer itemsin each slot, so the search is perhaps more efficient. We will look atthe analysis for hashing at the end of this section.
Self Check
- 1, 10
- Be careful to use modulo not integer division
- 13, 0
- Don't divide by two, use the modulo operator.
- 1, 0
- 27 % 13 1 and 130 % 13 0
- 2, 3
- Use the modulo operator
Q-1: In a hash table of size 13 which index positions would the following two keys map to? 27, 130
- 100, __, __, 113, 114, 105, 116, 117, 97, 108, 99
- It looks like you may have been doing modulo 2 arithmentic. You need to use the hash table size as the modulo value.
- 99, 100, __, 113, 114, __, 116, 117, 105, 97, 108
- Using modulo 11 arithmetic and linear probing gives these values
- 100, 113, 117, 97, 14, 108, 116, 105, 99, __, __
- It looks like you are using modulo 10 arithmetic, use the table size.
- 117, 114, 108, 116, 105, 99, __, __, 97, 100, 113
- Be careful to use modulo not integer division.
Q-2: Suppose you are given the following set of keys to insert into a hash table that holds exactly 11 values: 113 , 117 , 97 , 100 , 114 , 108 , 116 , 105 , 99 Which of the following best demonstrates the contents of the hash table after all the keys have been inserted using linear probing?
6.5.3. Implementing the Map
Abstract Data Type¶
One of the most useful Python collections is the dictionary. Recall thata dictionary is an associative data type where you can store key–datapairs. The key is used to look up the associated data value. We oftenrefer to this idea as a map.
The map abstract data type is defined as follows. The structure is anunordered collection of associations between a key and a data value. Thekeys in a map are all unique so that there is a one-to-one relationshipbetween a key and a value. The operations are given below.
Map()
Create a new, empty map. It returns an empty mapcollection.put(key,val)
Add a new key-value pair to the map. If the key isalready in the map then replace the old value with the new value.get(key)
Given a key, return the value stored in the map orNone
otherwise.del
Delete the key-value pair from the map using a statement ofthe formdelmap[key]
.len()
Return the number of key-value pairs stored in the map.in
ReturnTrue
for a statement of the formkeyinmap
, ifthe given key is in the map,False
otherwise.
One of the great benefits of a dictionary is the fact that given a key,we can look up the associated data value very quickly. In order toprovide this fast look up capability, we need an implementation thatsupports an efficient search. We could use a list with sequential orbinary search but it would be even better to use a hash table asdescribed above since looking up an item in a hash table can approach(O(1)) performance.
In Listing 2 we use two lists to create aHashTable
class that implements the Map abstract data type. Onelist, called slots
, will hold the key items and a parallel list,called data
, will hold the data values. When we look up a key, thecorresponding position in the data list will hold the associated datavalue. We will treat the key list as a hash table using the ideaspresented earlier. Note that the initial size for the hash table hasbeen chosen to be 11. Although this is arbitrary, it is important thatthe size be a prime number so that the collision resolution algorithmcan be as efficient as possible.
Listing 2
hashfunction
implements the simple remainder method. The collisionresolution technique is linear probing with a 'plus 1' rehash function.The put
function (see Listing 3) assumes thatthere will eventually be an empty slot unless the key is already presentin the self.slots
. It computes the original hash value and if thatslot is not empty, iterates the rehash
function until an empty slotoccurs. If a nonempty slot already contains the key, the old data valueis replaced with the new data value. Dealing with the situation where there areno empty slots left is an exercise.
Listing 3
Likewise, the get
function (see Listing 4)begins by computing the initial hash value. If the value is not in theinitial slot, rehash
is used to locate the next possible position.Notice that line 15 guarantees that the search will terminate bychecking to make sure that we have not returned to the initial slot. Ifthat happens, we have exhausted all possible slots and the item must notbe present.
The final methods of the HashTable
class provide additionaldictionary functionality. We overload the __getitem__ and__setitem__ methods to allow access using``[]``. This means thatonce a HashTable
has been created, the familiar index operator willbe available. We leave the remaining methods as exercises.
Listing 4
The following session shows the HashTable
class in action. First wewill create a hash table and store some items with integer keys andstring data values.
Next we will access and modify some items in the hash table. Note thatthe value for the key 20 is being replaced.
The complete hash table example can be found in ActiveCode 1. Party poker nj support.
6.5.4. Analysis of Hashing¶
Slot Collisions Meaning
We stated earlier that in the best case hashing would provide a(O(1)), constant time search technique. However, due tocollisions, the number of comparisons is typically not so simple. Eventhough a complete analysis of hashing is beyond the scope of this text,we can state some well-known results that approximate the number ofcomparisons necessary to search for an item.
The most important piece of information we need to analyze the use of ahash table is the load factor, (lambda). Conceptually, if(lambda) is small, then there is a lower chance of collisions,meaning that items are more likely to be in the slots where they belong.If (lambda) is large, meaning that the table is filling up,then there are more and more collisions. This means that collisionresolution is more difficult, requiring more comparisons to find anempty slot. With chaining, increased collisions means an increasednumber of items on each chain.
As before, we will have a result for both a successful and anunsuccessful search. For a successful search using open addressing withlinear probing, the average number of comparisons is approximately(frac{1}{2}left(1+frac{1}{1-lambda}right)) and anunsuccessful search gives(frac{1}{2}left(1+left(frac{1}{1-lambda}right)^2right))If we are using chaining, the average number of comparisons is(1 + frac {lambda}{2}) for the successful case, and simply(lambda) comparisons if the search is unsuccessful.