Why Do We Use the Sigmoid Function for Binary Classification?

preview_player
Показать описание
This video explains why we use the sigmoid function in neural networks for machine learning, especially for binary classification. We consider both the practical side of making sure we get a consistent gradient from the standard categorical loss function, as well as making sure the equation is easily computable. We also look at the statistical side by giving an interpretation for what the logit values represent (the values passed into the sigmoid function), and how they can be thought of as normally distributed values with their means shifted one way or the other depending on which class they are for.

My other video, "Derivative of Sigmoid and Softmax Explained Visually":

The Desmos graph of the sigmoid function:

Connect with me:

Join our Discord community:

🎵 Kazukii - Return
Рекомендации по теме
Комментарии
Автор

I like the visualization of the two normal distributions with the sigmoid function, very cool, never seen before :-)

TerragonDE
Автор

You are great! Thanks for making this so much easier to understand. I had a hard time understanding this while I was studying 6 years ago, but with all these great visualizations it all makes so much sense!

danielwie
Автор

Thanks for going over that with so much visual detail! I just heard about the swish activation function, would love to see your take on it!

carnright
Автор

Another reason is that the 1st derivative of the sigmoid function is a function of itself, making calculation of weight corrections in back-prop computationally efficient. Great video BTW!

neuromancer
Автор

This has to be the best explanation on the web for the SF.

cornelisderuiter
Автор

Excellent explanation, it really helped me understand the concept! I honestly dont think it could've been explained better!

aryang
Автор

Thank you Elliote!..this is a brilliant content...it helped me understand more in an intuitive way

shvprkatta
Автор

The branches example was so cool! Felt mentally transported to a foggy forest so as to observe the dripping dew drops.

HeduAI
Автор

many many thanks, I've been thinking about the reason for quite some time...

taiwanSmart
Автор

I in love with your keyboard, and thanks for the video.

Kikikuku
Автор

Thanks for this comparison of the different functions, brilliant content and the reason why you/we use this sigmoid function. Glad I found it. One small issue: Well not so small really, potentially I would have missed the great content because to Native English UK ears - I had no idea what you were talking about when you mentioned the ""Lawssssed" function" And it seemed really important. However after I persevered and FINALLY looked at the legend on full screen I realised what you referred to called a lawsssed function is what we call the "Lost Function" (it has a t at the end here in UK). Lost function. Thanks again. But you might consider putting on subtitles for UK viewers of other English Accents which pronounce ST as if there is a T in it.

kennymaccaferri
Автор

Dear Elliot please do more ML videos, you are giving the most intuitive explanations, love your content

ilkero
Автор

awesome explanation, I use normal distribution for trading - working on an ML system at the moment

bruceb
Автор

Thank you! I'm learning neural networks self study. This is the answer to the question I had

jennyjumpjump
Автор

Fun fact: the Sigmoid function was first introduced by Jack Cowan to model experiments on real neurons.

CalculationConsulting
Автор

Awesome! Thank you so much! This video was so intuitive!🙌

ankitdixit
Автор

amazing video how did you get that intuition of 2 normal distributions any links to books or articles would help thanks for the video its another level <3

gauravms
Автор

Hi, nice visualisation and explanation. I thought you are a DJ too. It is a bit unclear for me about your explanation of why compared to other equations, especially on the upper left quadrant. Also, for your foggy forest analogy, what does it means with the sigmoid function, does it show where the raindrop will end on average? How is this related to a binary option if the option is between 0 and 1?

shassy
Автор

If you want other perspectives, search for the video: why sigmoid: a probabilistic perspective. The GDA perspective is mentioned as well but it's not a full formulation, more like a motivating example.

NierAutomataB
Автор

What is the CPU/RAM extension you're running at the top of the macbook toolbar?

InquilineKea