filmov
tv
Decision tree fundamentals - Gini impurity and entropy | ML foundations | ML in Julia [Lecture 15]

Показать описание
Gini Impurity measures how often a randomly chosen element from a dataset would be incorrectly labeled if it was randomly labeled according to the distribution of labels.
Gini impurity = 1-Σpi^2
Here pi is the probability of the i'th class
Compare this with entropy
Entropy = -Σpi*log2(pi)
When you have only 1 class, pi=1.
Then Gini impurity = 0 = Entropy
If you have many classes (n) and 1 item in each class, then
Gini impurity → 1
Entropy = log2(n)
Both Gini impurity and entropy say similar things, but Gini impurity is simpler to calculate.
If all data points belong to the same class in a node, the Gini impurity = 0. These nodes can be called “pure leaves”.
So when to use Gini and when to use entropy?
⇒ When computational efficiency is critical, or when class proportions are imbalanced use Gini.
⇒ When you prioritize a deeper understanding of the information gain, or when you are dealing with balanced data and prefer a probabilistic approach use entropy.