filmov
tv
K Fold Cross Validation Understanding with an Example || Lesson 43 || Machine Learning
Показать описание
#machinelearning#learningmonkey
In this class, we discuss K Fold Cross Validation Understanding with an Example.
How K fold cross-validation works understanding with an example.
We discussed in our previous classes.
Dataset is divided into three parts.
1) Training data
2)Testing data
3) Validation data
Training data is used as input to the model.
After training the model using training data.
We use validation data for calculating the accuracy of the model.
If the accuracy is not good. we do some modifications to the model.
Again train the data using training data. and calculate accuracy using validation data.
The problem with this approach is we are wasting validation data.
We are not using the validation data for training.
As we discussed previously. machine-learning algorithms need a lot of data to identify generalization.
To overcome the problem K fold cross-validation takes both training and validation data for training.
If k =5, K fold will divide the entire training data into five parts.
Take the first part for validation and the remaining parts for training.
Train the model and check accuracy on validation data.
Again take the second part for validation and remaining parts for training.
Calculate the accuracy.
Repeat this by taking all the five parts.
The average of all the accuracy is taken as accuracy.
In this model, we are involving all the data for training and validation.
Which K is best?
After a lot of practical usages, the K value is taken between 5 to 10 based on the size of data.
Mostly consider 10 for large datasets.
We should not take very small parts and very large parts.
That's why ten is best in most cases.
Gridsearchcv is a class in sklearn. which we use extensively in machine learning.
All this K fold cross-validation is implemented in that class.
Link for playlists:
In this class, we discuss K Fold Cross Validation Understanding with an Example.
How K fold cross-validation works understanding with an example.
We discussed in our previous classes.
Dataset is divided into three parts.
1) Training data
2)Testing data
3) Validation data
Training data is used as input to the model.
After training the model using training data.
We use validation data for calculating the accuracy of the model.
If the accuracy is not good. we do some modifications to the model.
Again train the data using training data. and calculate accuracy using validation data.
The problem with this approach is we are wasting validation data.
We are not using the validation data for training.
As we discussed previously. machine-learning algorithms need a lot of data to identify generalization.
To overcome the problem K fold cross-validation takes both training and validation data for training.
If k =5, K fold will divide the entire training data into five parts.
Take the first part for validation and the remaining parts for training.
Train the model and check accuracy on validation data.
Again take the second part for validation and remaining parts for training.
Calculate the accuracy.
Repeat this by taking all the five parts.
The average of all the accuracy is taken as accuracy.
In this model, we are involving all the data for training and validation.
Which K is best?
After a lot of practical usages, the K value is taken between 5 to 10 based on the size of data.
Mostly consider 10 for large datasets.
We should not take very small parts and very large parts.
That's why ten is best in most cases.
Gridsearchcv is a class in sklearn. which we use extensively in machine learning.
All this K fold cross-validation is implemented in that class.
Link for playlists:
Комментарии