Hash Table in C

preview_player
Показать описание
Chapters:
- 0:00:00 - Announcement
- 0:00:41 - Why Implement Hash Table?
- 0:02:07 - Where we could use the Hash Table?
- 0:03:15 - New Project
- 0:03:40 - Nob build
- 0:07:54 - The Problem Description
- 0:08:48 - Downloading all the Works of Shakespeare
- 0:10:40 - Reusability of Nob
- 0:11:12 - Reading the File
- 0:13:09 - Tokenizing the File
- 0:16:23 - Better Tokenization
- 0:20:28 - Linear Associative Array
- 0:23:58 - Total Amount of Tokens
- 0:25:00 - find_key()
- 0:27:07 - Collecting Frequencies
- 0:29:03 - Nob Subcommands
- 0:31:13 - Smaller File
- 0:32:36 - File Path via Command Arguments
- 0:35:32 - Measuring Time
- 0:36:33 - Sorting Frequencies
- 0:37:39 - compare_freqkv_count()
- 0:38:16 - Generic Comparison in C
- 0:40:05 - Generic Comparison in Other Languages
- 0:40:45 - Generic Comparison in Rust
- 0:42:30 - Subtraction is All You Need
- 0:44:15 - Comparison is Basically Subtraction
- 0:44:39 - Descending Order
- 0:44:57 - Printing Top 10 Frequent Tokens
- 0:47:23 - Cleaning up logging
- 0:49:01 - Elapsed time
- 1:00:29 - Concatenating Files
- 1:02:07 - How Linear Search Works?
- 1:03:31 - The Hash Table
- 1:07:39 - Hacking Hash in Competitive programming
- 1:09:02 - Q: Have I done Competitive Programming before?
- 1:10:14 - What's the Competition in Competitive Programming?
- 1:11:11 - hash()
- 1:12:32 - naive_analysis()
- 1:15:05 - Iterating tokens
- 1:15:59 - Looking at the Hashes
- 1:17:33 - When Hashes Collide?
- 1:24:40 - Better Hash Function
- 1:26:55 - Open Addressing
- 1:28:44 - Probing Strategies
- 1:29:33 - Reflection
- 1:31:29 - Resolving Collisions
- 1:34:35 - Table Overflow
- 1:40:44 - Abstraction
- 1:46:41 - Adjusting Hash Table Parameters
- 1:47:34 - Sorting Final Results
- 1:48:59 - hash_analysis()
- 1:51:53 - Comparing execution time
- 1:54:05 - Sum is a bad Hash Function
- 1:55:25 - Testing Bigger Files
- 1:57:03 - djb2
- 2:03:18 - Further Abstraction
- 2:06:43 - Summary
- 2:07:08 - Potential Malicious File
- 2:07:53 - Just Do Dumb Things!
- 2:11:15 - Outro

References:

Support:
- BTC: bc1qj820dmeazpeq5pjn89mlh9lhws7ghs9v34x9v9
Рекомендации по теме
Комментарии
Автор

Tsoding has finally heard us, Gentiles, giving us the content that's on our level. The Lord has heard my prayers, amen. 😭😭

samuraijosh
Автор

I was asking myself some time how he got this good, but when implementing hash tables in C reminds you of your childhood that's probably the answer

maxmustermann
Автор

My goto hash table implementation is to use an array of arrays to handle the collisions, but I've also used an array of trees and an array of sorted lists and all manner of other combinations. I also always use a power of two for the table size so I can just do a `bitwise and` to normalize the hash and store the unnormalized hash with the keys to make collision checks faster. I also like to use a handy trick to speed up deletions, if that's an operation you even need to support, whereby I swap the key to be deleted with the last key in the table and just fix up the indices for that key then decrement the length of the table. It only reorders two elements with each deletion so it's a lot faster if you don't need the data in a sorted order after every modification, or for that matter need to preserve the order. I've used a DJB hash function every time I've needed string keys and I kind of flip flop on whether *31 or *33 is better, but I'd say play with it for whatever data you're using and either it won't matter or one will be just a bit better. Though, I always initialize it with 5381 as I figure that constant is what designates it as a DJB hash. Maybe I'm just weird.

anon_y_mousse
Автор

"Implementing hashtables in C kinda remind me my childhood" 💀💀

snw
Автор

At 43:43, the real reason is that the subtraction trick only works for integers less than around half the value of the maximum absolute allowed. If you subtract `SIZE_MAX` from `-SIZE_MAX`, you will get overflow and your comparison will fail. In your case, it's irrelevant because there are not that many items in the hash table. `SIZE_MAX` is very large, practically. A better way of coding the logic without risking overflow and without too much branching is `(b < a) - (a < b)`.

ElPikacupacabra
Автор

i learnt to make a hash table in harvard's free cs50 course and it's pretty similar to this so it's pretty cool. writing collections is pretty fun

jolynele
Автор

you can also use the clock() function for cpu time. it will give you the total cpu cycles that took between the operation

ic
Автор

I am now subscribed. I just found these streams and absolutely love them. The rant at the end sealed the deal.

divingeveryday
Автор

Wow, nob looks pretty freaking cool! Make sure your clock is just a wrapper for rtdsc with calibration. Also to preserve precision subtract prior to conversion to double and then do your multiplication*.

(Edit*)It's a minor detail that rarely matters but it's best to be in the habit of maximizing both accuracy and precision while ALSO maximizing performance (fewer FP ops = less rollover error and less fp mantisa mismatch error eg subtract a very small fp from a big fp will always = big fp no matter how many times you do it)

robmorgan
Автор

This is actually an exercise in The C Programming Language by K&R

AlmogD
Автор

I know it's not your goal. But with every video I get the idea that you're getting more and more into engineering standards.

Автор

perfect timing, i have been implementing some based data structures in c++ for fun these past few days

harleyspeedthrust
Автор

Let's Go we're back to C again

afterschool
Автор

you absolutely murdered that semicolon backseating guy

snr
Автор

I like the exploratory style. Very interesting lessons learned!

puzzlinggamedev
Автор

Love how the djb2 hash function is basically the same thing he did except it starts at 5381 instead of 0 and multiplies by 33 instead of 31

felixbilodeau-chagnon
Автор

Great stream and really good pieces of advice at the end!!

rodelias
Автор

43:20 also subtration can cause an overflow, for exemple if you're doing (-big_number) - (big_number)

loic.bertrand
Автор

Keeping my inspiration up with these vids 🫡

sanjaux
Автор

Поставлю лайк, наверняка в универе пригодится

slava