299 - Evaluating sklearn model using KFold cross validation​ in python

preview_player
Показать описание
Code generated in the video can be downloaded from here:

Let us start by understanding the Binary classification using keras . This is the normal way most of us approach the problem of binary classification
using sklearn (SVM). In this example, we will split our data set the normal way into train and test groups.

We will then learn to divide data using K Fold splits.
We will iterate through each split to train and evaluate our model.

We will finally use the cross_val_score() function to perform the evaluation.
It takes the dataset and cross-validation configuration and returns a list of
scores calculated for each fold.

KFOLD is a model validation technique.

Cross-validation between multiple folds allows us to evaluate the model performance.

KFold library in sklearn provides train/test indices to split data in train/test sets. Splits dataset into k consecutive folds (without shuffling by default).
Each fold is then used once as a validation while the k - 1 remaining folds
form the training set.

Split method witin KFold generates indices to split data into training and test set. The split will divide the data into n_samples/n_splits groups.
One group is used for testing and the remaining data used for training.
All combinations of n_splits-1 will be used for cross validation.

Wisconsin breast cancer example
Рекомендации по теме
Комментарии
Автор

You are one of the best data science teacher out there. Thanks for your good work and approach. You explain very well on a wide range of topics.

Master_of_Chess_Shorts
Автор

I used this module a lot during my work. thank for these great free libraries, it make data scientists easier. Most of work is to glue the data to these libraries.

caiyu
Автор

Been enjoying this KFold series. Looking forward to the next one. Thanks.

newcooldiscoveries
Автор

Great video. Just to clarify, is the purpose of cross-validation to tune the hyperparameters of models on a variety of different train_test splits to avoid overfitting? Cheers!

Gingeey
Автор

Dear Sreeni, thank you so much for your work! Have a good one!

DmitriiTarakanov
Автор

Good post. By the way, how do we select the best model after cross-validation? I am more interested in regression than classification. Have you tried using a multivariate polynomial regression model so that we could establish an empirical relation?

maheshmaskey
Автор

Before doing a crossvalidation, shoudn't you use a dimentionnality reduction technique to determine if all features are necessary for your training? Thanks by advance if you take the time to answer me!

guiomoff
Автор

Hi Thanks for the video.Code Generated is not in the github file you shared

maryamshehu
Автор

how to print roc curve for overall cross validation?
i have been trying to print roc curve but it shows me error apparently because i got different counts of tprs/fprs on each fold that prevents the code from showing

Автор

nice video, one silly question u are using in a pipeline minmaxScaler how does know the cross_val_score to apply minmax_score on X_array? I know it's silly question about I have the question because u don't transform your pipeline to X_array

Athens
Автор

Hi sir, i have been trying to implement video classification using CNN. All the content or tutorials out there are quite hard to implement or maybe I got used to your detailed explanation. Please do a tutorial on how to load video data. Thanks for all the high quality content.

ajay
Автор

Hi, great video series. Can you start a video series about medical image processing and ML like 3D MRI processing, stopping leaky validations and etc. It would be really useful because there aren't many resources.

malithabasuri