What is Machine Learning? Part 3/5: Decision Tree Algorithm explained - Entropy

preview_player
Показать описание
In this video, we are going to talk about an extremely important part of the decision tree algorithm, namely the metric that tells us where to best split the data. Or in other words, the metric that tells us what questions to ask. And the particular metric that we are going to look at is called entropy.

Edit: in the formula of the overall entropy at 16:35, "Entropy" should also have a subscript "j"

Links:
Рекомендации по теме
Комментарии
Автор

Clarification on Information Gain vs Overall Entropy (17:45)

The formula for Information Gain is basically like this:
Information Gain = Entropy before split – weighted Entropy after split

Or in other words:
Information Gain = Entropy before split – Overall Entropy

So, to determine the Entropy before the split, we need to calculate the following:
Entropy before split = 42/130 * (-log2*42/130) + 42/130 * (-log2*42/130) + 46/130 * (-log2*46/130) = 1.584

So, the Information Gain for split 1 is:
Information Gain = 1.584 – 0.646 = 0.938

And the Information Gain for split 2 is:
Information Gain = 1.584 – 0.804 = 0.780

So, split 1 results in a higher Information Gain and we would chose it over split 2. Therefore, we get the same result compared to the video, where we just used Overall Entropy.

The reason I decided to just use Overall Entropy and not Information Gain is because they are essentially the same. With Overall Entropy you focus on the fact that the entropy decreases from 1.584 to 0.646 after the split. And with Information Gain you focus on the fact that the entropy of 1.584 decreases by 0.646 to get an Information Gain of 0.938 after the split.

In my opinion, using Overall Entropy is simply more intuitive. Additionally, it requires one less calculational step.

SebastianMantey
Автор

I can't thank you enough for sharing such valuable information !!

hazemahmed
Автор

Excellent explanations. want to see more videos from you.

rajatpati
Автор

Thank you. Very clear explanation and visualisations.
Would be great if you gave some words on how these are used for regression.

SuperIdo