Faster than Rust and C++: the PERFECT hash table

preview_player
Показать описание
I had a week of fun designing and optimizing a perfect hash table. In this video, I take you through the journey of making a hash table 10 times faster and share performance tips along the way.

00:00 why are hash tables important?
00:31 how hash tables work
02:40 a naïve hash table
04:35 custom hash function
08:52 perfect hash tables
12:03 my perfect hash table
14:20 beating gperf
17:24 beating memcmp
21:46 beating SIMD
26:01 even faster?
30:06 pop quiz answers
31:45 beating cmov
33:09 closing thoughts

Thanks:

Attribution:
Thumbnail artwork by Jennipuff
JavaScript logo by Christopher Williams under MIT license
PHP logo copyright Colin Viebrock, Creative Commons Attribution-Share Alike 4.0 International
Ruby logo copyright © 2006, Yukihiro Matsumoto, Creative Commons Attribution-ShareAlike 2.5
Рекомендации по теме
Комментарии
Автор

Click 'read more' for performance tips with timestamps 👇

Source code of all of the techniques (for educational purposes) (warning: poorly organized):

Production C++ implementation of my hash table:

My JavaScript/TypeScript compiler project:

Rust reimplementation of my hash table:

Performance tips:
04:16 1. Try it
08:18 2. Make assumptions about your data
16:24 3. See how other people did it
20:08 4. Improve your data
20:28 5. Keep asking questions
21:17 6. Smaller isn't always better
24:57 7. Use multiple profiling tools
25:13 8. Benchmark with real data
29:46 9. Keep trying new ideas
32:42 10. Talk about your optimizations

strager_
Автор

Extreme optimization can be an addicting game. Very cool to see it documented.

ijlijl
Автор

I can't believe I watched a 34 minute video on hashing without getting bored and wandering off, this was an amazingly interesting talk.

spokesperson_usa
Автор

This video was incredibly easy to watch, you had my attention the whole time. I really hope the algorithm blesses you, because you deserve it.

BuRRak
Автор

Knowing your data is such a HUGE programming concept that isn’t talked about nearly as much as it should be.

ClayMurray
Автор

This guy is every programmer from each generation wrapped into one. I love this dude.

robrick
Автор

The 'c' in _mm_cmpestrc is because it returns the carry flag. What the carry flag actually means depends on the control byte.

Pence
Автор

fun fact, the "easiest hash function being the length" is actually why PHP has so many weirdly long function names. in early versions that's actually how the hashmap for globals would work, and the developers thought it was easier (at least early one) to just come up with lots of different length names, than replace the underlying structure (which they did later anyway, but at that point didn't want to rename everything)

nonchip
Автор

You would not really feel this is a 33 minutes video. The brief discussion in earlier sections made it easy to digest the things you talked in the upcoming sections. It's hard not to imagine you were rubber ducking while making the video because of how you explain is so relatable.

daleryanaldover
Автор

24:38 - I nearly lost it when you showed off those numbers. It's incredible to see what someone with in-depth knowledge can achieve. Appreciate the demonstration and advice!

Spookyhoobster
Автор

This guy clearly has a knack for teaching. I have not seen an explanation this clear and engaging in a long time.

linkertv
Автор

"Your data is never random"
I just want to say Bravo !
What a pleasure to watch a Data structure video from a real-world perspective!
Subscribe.

e
Автор

Smaller memory footprint *does* matter for performance, because caches. In this example, the table is just too small already to see that effect.

cmilkau
Автор

This video was truly amazing. Wow. I really appreciate all of the work you've put into this. I've always known that I could write my own assembly, but I've figured that the compiler would always optimize better than I could. Seeing how you were able to profile and optimize your code made me realize that I would have eventually come to some of the same conclusions - so I should probably at least give it a shot myself.
Thank you for making this video. This should be a required viewing for any computer science college students at some point in their education.

PS. I was slightly annoyed that you didn't dig into why the binary search was slower at the very beginning of the video until you kept going deeper and deeper into the different inner workings of all of these different hash algorithms and optimizations, making me realize that this is meant to be a densely-packed hash table video, not a binary search video. Thanks again.

Galakyllz
Автор

The other thing that I think makes calling memcmp slow is that the branch predictor gets thrown off inside memcmp because memcmp is called from other places in the program with other data; with the single character check the branch predictor has the luxury of only seeing that branch used for the keyword lookup and nothing else.

furl_w
Автор

clicked for the thumbnail, stayed for hash tables

nilusnilus
Автор

This is amazing! I never expected you'd be able to get more than 10x faster than builtin hashes. I didn't expect how wildly the performance would vary from modifying your algorithm, assumptions about the input data and assembly instruction optimisations. Seeing the numbers go up and up with your thorough explanations of each modification you made tickled my brain. I'll definitely be coming back here for more!

Rose-eche
Автор

Amazing video, hash tables are such an underrated concept, everyone uses them but few ever mention them or how they work, let alone do that in depth, you're awesome!

markzuckerbread
Автор

That was insane. I've never seen someone walk through such a deep optimization. Thanks so much for this video!

andydataguy
Автор

This video has been popping up in my suggestions for the past two months and I'm so glad I've finally watched it. I can see how these concepts still apply, even if you don't end up going down the rabbit hole and writing assembly instructions. Thanks a lot!

osamaaj
welcome to shbcf.ru