filmov
tv
Lec 2.2: Gradient Descent for Logistic regression
Показать описание
Welcome to this lecture about Gradient Descent in Logistic regression.
In the previous lecture, we introduced the cost function equation.
In this lecture, we will introduce Gradient Descent which is the method used to learn the parameters w and b which in turn minimizes the cost function.
The cost function for logistic regression is a convex shape, which means it only has one global minima. To illustrate this, let us take a look at this graph.
On the x and y-axis respectively we see w and b. This a convex shape and the lowest part of the shape is the global minima which represent the least cost function, that is, the optimal value for w and b.
So the question is, how do we initialize the gradient descent to calculate the value of w and b?
1) First, We initialize w and b with random values, but usually, it is a zero
2) Second, we add a learning step, if it is too small it will take forever to reach the global minima, if it is too big it will not reach the global minima. Therefore tuning the learning rate is a field in itself.
3) We update the value of w and b at the rate of alpha until we reach the bottom, or it converges finding the global minima.
This is best illustrated by the following equation
The equation to find w is the following: The alpha is known as the learning rate, this is the partial derivative of w which is the change of w and controls the movement n the slope. The outcome is how much the functions slopes in the w direction.
Similarly, The equation to find b is the following: The outcome is how much the functions slopes in the b direction.
This is also known as partial derivate since J is a function for 2 or more variables.
Gradient Descent Illustration. Image from:
In the previous lecture, we introduced the cost function equation.
In this lecture, we will introduce Gradient Descent which is the method used to learn the parameters w and b which in turn minimizes the cost function.
The cost function for logistic regression is a convex shape, which means it only has one global minima. To illustrate this, let us take a look at this graph.
On the x and y-axis respectively we see w and b. This a convex shape and the lowest part of the shape is the global minima which represent the least cost function, that is, the optimal value for w and b.
So the question is, how do we initialize the gradient descent to calculate the value of w and b?
1) First, We initialize w and b with random values, but usually, it is a zero
2) Second, we add a learning step, if it is too small it will take forever to reach the global minima, if it is too big it will not reach the global minima. Therefore tuning the learning rate is a field in itself.
3) We update the value of w and b at the rate of alpha until we reach the bottom, or it converges finding the global minima.
This is best illustrated by the following equation
The equation to find w is the following: The alpha is known as the learning rate, this is the partial derivative of w which is the change of w and controls the movement n the slope. The outcome is how much the functions slopes in the w direction.
Similarly, The equation to find b is the following: The outcome is how much the functions slopes in the b direction.
This is also known as partial derivate since J is a function for 2 or more variables.
Gradient Descent Illustration. Image from: