A forbidden Python technique to put ANYTHING in a dict or set.

preview_player
Показать описание
Use with caution!

Common programming wisdom tells us that mutable objects should not be hashable since mutating the object might change its hash. But sometimes you really just want to have a dict or set of mutable things that you promise won't have their hashes change while in use. In Python there is a devious trick to do just that. Its legitimate use cases are very niche, but in a pinch you can use the identity of an object as as proxy for hashing the object itself. Learn how in this intermediate level Python video!

SUPPORT ME ⭐
---------------------------------------------------
Sign up on Patreon to get your donor role and early access to videos!

Feeling generous but don't have a Patreon? Donate via PayPal! (No sign up needed.)

Want to donate crypto? Check out the rest of my supported donations on my website!

Top patrons and donors: Jameson, Laura M, Dragos C, Vahnekie, Neel R, Matt R, Johan A, Casey G, Mark M, Mutual Information, Pi

BE ACTIVE IN MY COMMUNITY 😄
---------------------------------------------------

CHAPTERS
---------------------------------------------------
0:00 Intro - hashing
1:20 Alternatives
1:44 Object identity
2:15 FORBIDDEN - IdMapping
3:14 FORBIDDEN - IdSet
3:44 See it in action
4:00 The downside
5:07 Thanks
Рекомендации по теме
Комментарии
Автор

Some forbidden knowledge comes in handy! Thanks James as always for the awesome content you put up!

Phaust
Автор

I actually use this technique from time to time, especially in graphics/game stuff

Ganerrr
Автор

"In addition to making esoteric and questionable python content on youtube" 😂

KappakIaus
Автор

This type of hashing is actually quite popular and useful in the games, where each entity has its own unique ID, but cloning is not a common thing, and cloned entity is not equal to its donor even though they are identical.

PeterZaitcev
Автор

Interesting you chose 257 to demonstrate the problem. Is that b/c ints 256 and below always have the same id? I believe I recall a fact like that.

Mutual_Information
Автор

mad respect to the author for breaking down the twin snake sacrifice shoot technique in python, especially since this is some advanced wizardry that many don't dive into

but real talk, while it's cool to learn about these forbidden techniques, relying on them in production code might be a big yikes

using object IDs for hashing might lead to confusing behaviors for future devs who come across your code

always good to know the tricks of the trade, but there's often a reason some techniques are labeled "forbidden"

still, props for the deep dive and making it understandable

moondevonyt
Автор

Huh, I just stumbled upon this solution myself literally 3 days ago by having "def __hash(self)__: return id(self)" for a class where I knew the objects would live for the whole run time and "is" was as good as "==". I can see the downsides, but it is so helpful. In some cases, it can even be faster I think? hash() can be slow in some situations (ie, long or nested tuples), but id() is always very fast I believe?

QuantumHistorian
Автор

I'm pretty chuffed that I know exactly why your int(float()) example uses 257, without having to look it up.

gloweye
Автор

🎯 Key Takeaways for quick navigation:

00:00 🛍️ Hashing in sets and dictionaries for efficient lookups is like finding milk in a grocery store's refrigerated section.
00:28 🔄 Hash collisions degrade performance; mutating an object changes its hash.
00:55 🔢 Mutable things should not be hashable in Python; defining hash or setting it to None is advised.
01:21 🧩 Using an object's identity to create custom mapping and set types for unhashable/mutable elements.
02:15 🗺️ Mapping type uses id of key for lookups; ensures key is not garbage collected.
03:11 🔑 Set type uses id of value for existence checks; prevents value from being garbage collected.
04:06 🔍 In IdSet, duplicates require exact same objects, not just equal ones.
04:33 ⚠️ Using IdMapping or IdSet can be confusing; forbidden Python knowledge.
05:01 🚫 Forbidden technique useful only when certain objects are consistently used.
05:27 💼 Creator's disclaimer; the technique has potential but is discouraged.

Made with HARPA AI

onhazrat
Автор

You could make the IdSet/IdMapping use the id of a key only when it is emutable or when it doesn't have a hash method. That way, you narrow the problem to only the case when you want to be able to acess the same value with different and equal keys. To go around this problem, you could set extra data on initialisation th specify what emutable types abould use identity and what thpes should use equality for keys made from those types. Most of the times, the keys in sets and dics would be of the same type, so you could define IdSet/Idmapping subclasses that should have only one type of keys (which is already something that exists in the standard library, as defaultdict), just to make the usage clear, and safe.

shahaffrsshahaffrs
Автор

I have said it before and I will say it again: this is THE premier advanced python channel on YouTube

benshapiro
Автор

I guess confusion between mutability and hashabilty should be addressed somehow. The idea that if we mutate the object it's has is going to change is not entirely true. There could be a lot of real-life scenarious when mutable objects are actually hashable and that don't cause any problems. We just need to make sure of 2 things:
1. Hash is compatible with equality, meaning there should not be any objects that are equal but has different hash. That is the real (and documented) reason why __hash__ is implicitly set to None if we define __eq__ in class but not define __hash__ itself. Default __hash__ implementation would most likely be incompatible with modified __eq__. Notice that this has nothing to do with mutability of an object, it just prevents random breakage of Hashable protocol.
2. Hash (and eq) should not depend on mutable state. That is not strictly necessary for some operations, but it is necessary when hash is used for looking up things that might change over time. Basically objects could be mutable and you might even create two separate objects with different attribute and still have them equivalent if only a subset of their state (values of attributes) participate in hash calculation and __eq__ operation. For example you could ignore some comments or other unimportant values.

And you could also has it the other way around: immutable object could be not hashable. That's rare but sometimes reasonable.

Currently official python documentation is unclear about some of this stuff and it would be nice to reword it better.

evlezzz
Автор

I've had to use a similar technique to implement FFI over mutable objects, that is two objects which have a shared state between two languages by propagating any mutation to the other side. I need to have a consistent ID for this object on both sides of the fence, and which object corresponds to this ID, and I need fast way to remember which objects are managed without keeping them alive. The result is that I need a *weak* and *invertible* mapping between ID and object.

In your solution this is decidedly not a weak mapping, but if you did want to make it weak then you have to add weakref finalizers to the objects so that they propagate their destruction to the cache and the other side so we remove their ID from the mapping.

MagicGonads
Автор

There are occasions that I've wished something like this was built-in (in "collections", maybe)... or related things like a version of "in" for lists that used "is" instead of "=="...

It doesn't come up often, and it would be a confusing problem if it was used more commonly than it should be, but when you want it you know.

mrphlip
Автор

I was wondering why I could store class objects in sets even if they are mutable. This explains it!

samuelthecamel
Автор

I have generally only needed something like this for lists of integers. And for that there is a simpler hack of converting the keys to tuples of integers

timseguine
Автор

Thank you, each of your vids widens my understanding on Python!

vnikolayev
Автор

My solution to this is tracking the hashable objects by their own hash and the unhashable ones by their id.

aouerfelli
Автор

I recently had to simulate objects moving through a 2d grid, and because their position needed to be updated and overlaps were allowed I couldn't hashed them based on their state, so this forbidden techniche came in handy.

antoniov.fuentes
Автор

Thinking about it, it would probably be better to implement this as a fallback, so only objects without hashing will use their id instead

SkyyySi