Python Hash Sets Explained & Demonstrated - Computerphile

Показать описание

Hash Sets in Python work a little bit like the index of a book, giving you a shortcut to looking for a value in a list. Dr Mike Pound explains how they work and demos with some code.

#Python #HashSet #Code #Computerphile

This video was filmed and edited by Sean Riley.

Рекомендации по теме

Комментарии

"O(1) means that there is no relationship between the speed of the lookup and the number of elements in your collection". Couldn't have said it better, as often with big O, the devil is in the constant you dropped during notation :D

Davourflave

I invented and implemented this very scheme in 1978, on an HP9845A in HP Basic with a 20 MB hard disk, and discovered a few things:
1) Hash collisions are best stored in a sorted list, so that a binary search can be done, reducing the search time dramatically.
2) Hashing integers as themselves is a disaster in the real world, where initial keys of 0 proliferate. (Amongst other common integers, such as -1.)

MichaelKingsfordGray

Side note: if your list of values is static and known in advance, the Gnu “gperf” tool can come up with a “perfect” hash function that gives you the minimum array size with no collisions. It generates C/C++ code, but the output should be portable to most other languages with a small amount of effort.

trevinbeattie

It's important to point out here that probability of collision can be reduced by increasing table size, but then your "utilization" of your table space will be lower. It's absolutely a trade-off and you can't improve one without degrading the other. As your table fills up, collisions will become more and more common. For example, if your table is 80% full, then almost by definition there's an 80% chance of a new item colliding with an existing one. The uniformity of the hash function pretty much guarantees it. There's a lot of nice probability theory analysis you can do around these things.

Of course, that 80% full table gives you a 64% chance of colliding on your first two tries, a 51.2% chance of failing on the third probe, a 41% chance of failing on the fourth, and so on. The average length of the chains you wind up with goes up sharply as you push up utilization.

KipIngram

Not really interested in the topic, because I already know this, but still watched because Mike's presentations are always engaging

JaapVersteegh

I implemented this in ADA back in 1997 during a class in computer science. I think 73 was a pretty good prime to use while hashing to minimize collisions.

gustafsundberg

Implemented exactly the simple form of this is a commercial compiler around 1980 to store the symbol table (list of all identifiers defined in the program being compiled, what type, size etc.). Chosen for lookup speed as the symbol table is accessed frequently in the compilation process

Richardincancale

Bloom filters could be a good follow-up to this.

prkhrsrvstv

I saw an interview question video yesterday about these - really good timing for me this video. 😁😁😁

bensmith

Oftentimes on modern hardware, particularly on small datasets, a linear scan can be faster than a hashmap lookup because the hashing function is slow.

jonny__b

I watched a talk on Python dictionaries, the guy that worked on the new implementation had gone into detail how they are more closely related to databases than hash maps. It was done to increase performance, and since almost everything in Python has a backing dictionary, it made a large difference in runtime.

jfftck

I'm not new to programming, but I'm new to Python and I was just literally looking into what uses hash tables in Python. Thanks. Lol

Loki-

12:02 in the *___contains___* function there shouldn't be an *_else:_* before the *_return False_* at the end, otherwise in case if the list *_self._data[idx]_* is not *_None_* and the item is not in that list, the return value won't be a boolean.

ibrahimmahrir

Thanks for your videos Mike keep it rolling 🎉

exchange

The built in `array` structure in PHP is mostly a hashmap, and is extremely widely used. Arguably a bit too widely used sometimes, since programmers often use it with strings that they have chosen in advance as the keys, and data supplied at runtime only as the values. In that situation replacing it with a class, with a predefined set of properties known to the compiler, both improves performance and can make the program easier to understand.

barneylaurance

The topics discussed on this channel have been the ones that really specifically interest me as of late. This is cool, thank you!

cinderwolf

Just what i wanted now. thanks a lot :)

princevijay

Hi, could you do a video on characteristics of a good hash function used in hashtables and their evaluation as a followup video?

olivergurka

A follow-up on the amortized complexity would be nice. Because it's a bit disingenuous to call the hashmap insertion O(1) if the underlying table doesn't grow. ^^

Ceelvain

13:30 I get that it's easier to generate numbers, but I think pedagogically it makes much more sense to use strings.

import random

with open('/usr/share/dict/words') as f: words = [w.rstrip() for w in f]
print(random.choices(words, k=10))

cacheman

Python Hash Sets Explained & Demonstrated - Computerphile

Python Hash Sets Explained & Demonstrated - Computerphile

HashMaps in Python Tutorial - Data Structures for Coding Interviews

Leetcode - Design HashSet (Python)

Design HashSet - Leetcode 705 - Python

Design your own Hashset | Coding Interview Question

How do dictionaries (hashmaps) actually work?

What is a HashSet? | Data Structures | Easy explanation with animations | Study Algorithms

HashMap & HashSet in Python

15 Important Python Concepts for Beginners

Hash Tables explained with PYTHON

Hash Tables (Dictionaries) and Sets (Python) - Data Structures and Algorithms

Hash tables in 4 minutes

Hash Table And HashMap In Python | Implementing Hash Tables Using Dictionary In Python | Edureka

Python for Coding Interviews - Everything you need to Know

Hash Tables: Hash Functions, Sets, & Maps - DSA Course in Python Lecture 4

Learn Hash Tables in 13 minutes #️⃣

Design HashSet | Leet code 705 | Theory explained + Python code | August Leet code challenge

Introduction to Hash Tables and Dictionaries (Data Structures & Algorithms #13)

Hash Set in C++, Java and Python | Knowledge Center

Design HashSet - Leetcode 705 - Python

Hash Table - Data Structures & Algorithms Tutorials In Python #5

Runtime of Hashset | Data Structures and Algorithms in Python

C# Hashsets - Understand them, use them, LOVE them

Hashsets in Python! #leetcode #leetcodedailychallenge #pythonprogramming