31251 Lec 6.1: Hash Tables

Показать описание

In this video we talk about a data structure called a hash table, which implements the operations of insert, lookup (contains), and erase. We motivate the design of a hash table by going back to the Leet Code problem "contains duplicate", where the task is to determine if a vector of integers contains any number more than once. A hash table is well-suited to solving this problem.

The basic idea behind a hash table is to associate each item we want to store with an index into an array. The first idea is just to store the item at this array location, which we can think of as a bucket. However, we can have two distinct items which hash to the same bucket--this is called a hash collision. To handle collisions we can have each bucket store its own data structure. Then all items which hash to this bucket are stored in this data structure. This method of resolving hash collisions is called separate chaining.

The key to the performance of a hash table is to have the hash function evenly distribute the items over all the buckets. In order to argue that this happens we need some assumption about the inputs. We look at an assumption called simple uniform hashing from which one can argue that lookup in a hash table takes constant time on average.