How Huffman Trees Work - Computerphile

preview_player
Показать описание
How do we derive the most compact codes for a situation? Huffman Trees can help. Professor Brailsford explains how computer scientists like their trees to be upside down.

This video was filmed and edited by Sean Riley.

Рекомендации по теме
Комментарии
Автор

I just love his enthusiasm and ability to simplify things that might otherwise be difficult to understand

TonyManso
Автор

"And now a bit on chemistry...." The mans level of knowledge is astounding lol.

Cathalion
Автор

I don't know what I'm looking at after the tree diagram. I would love to learn and take notes on what this all means - I watch computerphile in my spare time when I am not editing videos, photos and other work for clients. It's good for the mind!

jonnypanteloni
Автор

Just wrote a file compression program in smalltalk for my class. I love how he explains the algorithm.

bobbyd
Автор

I used H for enthalpy in my chemistry classes, S for entropy and Q for heat

AvZNaV
Автор

I was a little confused at the start of the video - I thought I'd skipped a bit by mistake.
But then, almost immediately, I remembered about that good ol' weather station and it all came clear.
Maybe there should've been a "Previously on Computerphile..." intro sequence

AlanKey
Автор

Everything that could potentially be a prefix of a code is contained within the tree as a node that splits. Only the nodes that don't split are assigned a code. In the example you can see that "1", "11", and "111" are all assigned to nodes that split, meaning they are prefixes for codes and therefore cannot themselves be codes.

pdieraue
Автор

Is that kind of reel paper still in use or is that all old stock? I can't remember the last time I saw those types of printer paper. I thought they were only used in dot matrix printers.

Disthron
Автор

Peter's reply is right, but to explain it another way, Huffman Codes are "instantaneously decodable". Your codes are not this. Take the bitstream: 011001. Under your coding scheme, does this mean Cod (0), Bass (1), Tuna (10), Cod (0), Bass (1) OR Cod (0), Bass (1), Barracuda (100), Bass (1) ? An instantaneously decodable code is one which, as you read the bit stream, once you have a match, that's your answer, and you know where the start of the next code word is.

parkamark
Автор

I would like to have this gentleman as my teacher! So inspiring!

Cyberkygen
Автор

Does anyone have a link to the original 12 page paper David Huffman published in 1952? Thanks!!

timsiwula
Автор

I believe he explained it in the first video on this topic.

When you are sending the message, if at any point the sent message is equal to one of the messages that they recognise, it stops.

So in your example, if they wanted to send Shark, they would send 11. However, in order to send 11, you must first send the first 1, and then the second 1. After the first 1 has been sent, it recognises that as Bass, and stops.

JimCullen
Автор

My last computer science project of the year was a huffman compression algorithm encode/decode. My solution was so hacked together. The provided code for c++ contained c code which works perfectly but is not exactly right in terms of good coding practices. Then my group member did his own solutions. My code was the most buggy thimg ever. It took weeks to get it right and for the automarker to accept it.

Keduce
Автор

Why don't you do Shannon-Fano as well? That's of course what the Zip/DEFLATE format uses, so huge practical appeal. Thought not quite optimal. :)

samposyreeni
Автор

I think what's really missing here is the biggest application for huffman tree's, text compression. We use a similar system, replacing probabilities with counts of chars, but the big thing is prefix-free code. We can generate unique binary codes representing a traversal of the tree to a specified goal. It's a good topic for another video/sequel to this one.

dooge
Автор

If you see in your example you will find that the only entry that is not ambiguous is the first one, all the others don't represent valid data.
Using 011 could represent: Cod(0) + ((Bass(1), Bass(1)) or Shark(11)), the Huffman algorithm prevents that from happening, another way to put it is that you only have valid codes on the tree leafs and in your example you're using the branches to contain them.
Another thing if you don't keep your tree small you may end with a bigger output than the input

MrCOPYPASTE
Автор

Title confused me, because I'm so used to seeing upside down trees!

cyberbss
Автор

No problem :)

The receiver might then have seen 11 at the end of a packet and know another 1 or 0 is needed to make sense of it. They'll wait for the next packet to arrive and add that to the buffer. Basically they'll always consume the buffer a bit at a time until they get a code they recognise.

For the termination code, if you can't send a header because you don't know how much you'll send, you'd add a code with the lowest possible probability (it occurs once in the whole stream).

Kelimion
Автор

7:00 Need an example sea creature? Why not Zoidberg?

UB
Автор

The advantage is that with this algorithm it's impossible to create ambiguous code, so you can put a lot of them next to each other and the original message remains perfectly clear.
With your example, if someone sent you 10, you wouldn't know whether it's 1 tuna or 1 bass and 1 cod.

sanko