Linear Regression with Gradient Descent + Least Squares

preview_player
Показать описание
Linear regression is a powerful statistical tool for data analysis and machine learning. It assumes a hypothesis (model) that is a linear combination of the independent variables. Using gradient descent, the parameters of the model can be learned using the available data; both the batch gradient descent and stochastic gradient descent algorithms are detailed here. Another way of looking at linear regression is through linear least squares, we also explore that solution.

** SUBSCRIBE:

 ** Follow us on Instagram for more endless engineering:
** Like us on Facebook:
** Check us out on twitter:

Рекомендации по теме
Комментарии
Автор

less than 1000 views ?!!!! this video must has at least views . this was amazing u

neonlearn
Автор

This tutorial helped me understand the cost function easily.

melakuhailelikka
Автор

Thanks for making linear regression interesting & helping in developing intuitions. Appreciate your effort!

chakkadi
Автор

Thank you for this clear explanation of linear regression with gradient descent, it was a very well thought class. For the first time I feel like I understand!

karinwiberg
Автор

I really like how you explained convergence 9:00

Christian-mndh
Автор

May I ask how the 2 in "2/2m" at 17:34 end up there, where does it come from? I understand that its the part of the derivate in non-vectorized solution, but from where in the vectorized one?

karinwiberg
Автор

Wow yan ung hirap ako noong college mg isip about problem solving he he..

ladydollvlogs
Автор

Wow! Maths is explained very well. Please make some more videos

spurthygopal
Автор

Hello guys, i'm a little confused here, so help me out. In the theoretical lecture, you're taking the gradient of "J" (cost function) as the summation of ((y_hat[i]-y[i])*x[i])/m for all values of i(1, m). Only after the summation of the gradient is calculated for all the values of i(1, m) we are calculating the new theta (parameters) by plugging in the gradient which is the sum of ((y_hat[i]-y[i])*x[i])/m for all values of i(1, m) along with alpha and subtracting it from current theta value.


But, when i went through the code in the jupyter notebook, in the function lin_reg_batch_gradient_descent, you're calculating new theta(parameter) for every i and then adding it to the current value of theta.


so instead of this


for x, y in zip(input_var, output_var):
y_hat = np.dot(params, np.array([1.0, x]))
gradient = np.array([1.0, x]) * (y - y_hat)
params += alpha * gradient/num_samples


shouldn't it be (according to the theoretical lecture) this


gradient=np.zeros(2)
for x, y in zip(input_var, output_var):
y_hat = np.dot(params, np.array([1.0, x]))
gradient += np.array([1.0, x]) * (y - y_hat)
params += alpha * gradient/num_samples



when i used the second piece of code, the gradient value becomes so big i get an overflow error.

sundeepreddy
Автор

when we use simple linear regression and multi-linear regression then it use OLS by default or it use GRADIENT DESCENT for finding the best fit line ? please answer my question

bhartichambyal
Автор

I think the xbar vector is in the space of n+1 not n

arda
Автор

Can I ask you one thing: Consider I have a dataset with many different features (counts of bacterial phyla) and the dependent variable is whether the subjects have a disease. I understand this is not linear but lets say it where, I am wondering could you instead of [1, x, x^2 x^n] use [feature1, feature2, feature3 .... feature4] and treat the function as a linear combination of features? Or am I completely wrong here? My main Im trying to understand is how my rows and columns of my dataset relate to regression.

louisebuijs
Автор

I have a query how could I optimize my cost function for selection of theta on the curve such that I could lower the computation time, rather than selecting it randomly

ajiteshbhan
Автор

I dont understand Gradient Descent for ML, and Least Squares for statistical learning. Then which one is better?

phuccoiinkorea
Автор

You should have used conventional symbols, this is making my brain hurt. why use m when n could substitute for num of data points, why us theta at all?? Your explanation is good but could be better by leaving out complicated symbols. why convert into a vector??

OpeLeke