Cross Validation in Machine Learning

preview_player
Показать описание

In this 365 Data Science Tutorial we will be focusing on one of the most important practices in machine learning – cross validation, also known as N-fold cross-validation or K-fold cross-validation. Cross validation is a technique that is used in many areas of machine learning. We will use it specifically for choosing the best set of parameters for our support vector classifier. It is important to know the general meaning of cross validation and its different applications.

First let us say you have an already labeled dataset and you wish it to train it on machine learning models and choose the best performing one. What would likely happen if we used only one training set is that one of the models would perform better on it. However, it is often the case that during testing on a new dataset, the model would not perform as well as compared to the training phase. In most cases, the reason is that the model has incorporated some of the bias from the training data, and as a result it is unable to generalize with the same accuracy on a new dataset during testing. In other situations, you want to train the machine learning model on your data but do not have large enough sample size

You might think that in this data-driven age, the abundance of data causes problems rather than not having enough of it- which is true in most cases. However, in some areas like life sciences obtaining a large enough sample is often easier said than done. To obtain data we need to devise experiments which are often time-consuming and expensive.

Here is where cross validation comes into play , as it allows us to artificially increase the amount of training data without actually sampling new points. The idea of cross validation is based on splitting our data into N or K equal parts and then creating new variations of it. Watch till the end of the video, to see how this exactly works, how to calculate the average performance of our algorithms across all datasets and find out other instances where cross validation might be beneficial.

365 Data Science is an online educational career website that offers the incredible opportunity to find your way into the data science world no matter your previous knowledge and experience. This is why we have dedicated this channel to those who are completely new and are curious to explore the wonderful world of data science. Once you have built a basic theoretical knowledge you can sign up to our comprehensive curriculum where we have prepared numerous courses that suit the needs of aspiring BI Analysts, Data Analysts and Data Scientists.
#crossvalidation #machinelearning #365datasciencetutorials
Рекомендации по теме