From Scratch: How to Code Logistic Regression in Python for Machine Learning Interviews!

preview_player
Показать описание
Logistic Regression in Python | Batch Gradient Descend | Mini-batch Gradient Descend | Data Science Interview | Machine Learning Interview

📚 Derivation of gradients

🟢Get all my free data science interview resources

// Comment
Got any questions? Something to add?
Write a comment below to chat.

// Let's connect on LinkedIn:

====================
Contents of this video:
====================
0:00 Intro
0:42 Logistic Regression Basics
2:22 Logistic Regression Deep Dive
6:41 Logistic regression implementation
10:16 Mini-batch Gradient Descend
Рекомендации по теме
Комментарии
Автор

I got the job! Thanks so much for these great videos, Emma! Having these as study materials helped me practice in ways that are really hard to duplicate, especially when everyone is remote these days. I think I went through every video about three times, and several of the ideas you raised in your videos were brought up by the interviewers in the on-site. The hypothetical questions were the best parts because I could pause, practice answering the question myself, then follow your input as a way to get feedback. I’m now recommending this channel to anyone in my DataSci network looking to brush up on interview practice. This channel and the StatQuest channel made the difference, you rock!

willjohnson
Автор

Even though logistic regression is used for classification, it is still a regression! Only selecting the decision boundary/threshold (which may be different from 0.5) makes it a classification algorithm.

serhiidyshko
Автор

Similar to the linear regression implementation video, I still have problems with the argument that "the sign used to update the gradient depends on how you set up the loss function". I don't think it does, i'm thinking it should always be param -= gradient * LR.

At 9:23, there's this discussion of derror_dy = pred - y[i] vs y[i]-pred. How do these 2 equations relate to the logloss at 5:10? Where in the model formulation did we have the freedom to choose to set it up as pred - y[i] or y[i]-pred? (and thus leading to the 9:12 point you want the audience to pay attention to)

Han-veuh
Автор

Thank you so much for making these videos 🥺 this weekend I'll watch them all and take notes. This are so helpful 🥰

jairocarreon
Автор

Really clear and easy to understand description. 👍

saeidsamizade
Автор

Thanks Emma, I am learning regression, p and residual, etc.

lydiamai
Автор

When computing gradient-descent on mini-batches, it's okay to draw random datapoints with replacement? Due to the randint method, you could be getting the same datapoint multiple times.

TheACG
Автор

Hi, Emma, thank you for the great video! Does this kind of knowledge depth apply to the data analyst interview? Thanks!

nan
Автор

Could you explain a little more about learning rate and how to optimize this parameter?
Thx!

梅鹏飞-ge
Автор

在实际使用GPU计算的过程中,实际上GD是比batch GD快的,因为GD可以parallel compute,而batch

tiantiantianlan
Автор

Hi Emma, thanks for the great video! I wonder if during the interview they would allow us to use python modules like "numpy" to help implement the algorithm. Do you know if that is usually allowed?

nataliatenoriomaia
Автор

I saw some textbooks and documents mention using iteratively reweighted least squares or Newton's method to get estimates of parameters. What are the differences between these two and Gradient Descent? Thank you!

jiayiwu
Автор

In the mini-batch gradient descent, how do you ensure you do not sample data points that you have already processed? Doesnt seem like the code handles that?

Bookerer
Автор

How do you remember the gradient formula at 6:10 (to use it in interview)?
Every data scientist can derive it in 10mins if they are **not** under pressure but in an interview setting you'd better remember it. Any tips?
---
BTW for those who like to vectorize - this formula can be succinctly written as:
dJ/db = sum_over_i ( p(x_i) - y_i ) x_i

hl
Автор

When x = all our independent features
Shouldn't n = len(x) and m = len(x[0])?
Because you say that n is the number of dimensions / features
And m is the number of data points.
But in your code n and m are the other way around

MrReapzzGaming
Автор

Could you explain about what happens if we set derivative of loss function to 0. Why we can`t do that?

thampasaurusrex