Machine Learning | Cross Validation | Random State in Train Test Split | ML | AI

preview_player
Показать описание
Machine Learning | Cross Validation | Random State in Train Test Split | ML | AI

Topic to be Covered - Importance of Random State in Train Test Split

Table of Content
0:00 Introduction
00:14 Import pandas library
00:17 Import dataset using pandas read_csv function
00:39 Handle missing values
00:59 Extract features and labels
01:15 Label Encoding
01:37 Sampling - Train Test Split
02:10 Random State
03:20 Compare X_train value with the previous run when random_state remains the same
04:00 Change the value of random_state from 0 to 1
04:20 Compare X_train value with the previous run when random_state is changed from 0 to 1
06:29 random_state=None
07:28 Compare X_train value with the previous run when random_state=None

Code Start here
=============
import pandas as pd
import numpy as np

'''Get the rows that contains NULL (NaN)'''

'''Fill the NaN values for Occupation, Emplyment Status and Employement Type'''

col = ['Occupation','Employment Status','Employement Type']

df['Age'].fillna(df['Age'].mean(),inplace=True)
df['Salary'].fillna(df['Salary'].mean(),inplace=True)

'''col1 = ['Age','Salary']

'''------------------------------- L A B E L E N C O D I N G ------------------'''

encode = LabelEncoder()

'''S A M P L I N G'''

X_train2, X_test2, y_train2, y_test2 = train_test_split(features,
labels,
test_size=.25,
random_state=None)

All Playlist of this youtube channel
====================================

1. Data Preprocessing in Machine Learning

2. Confusion Matrix in Machine Learning, ML, AI

3. Anaconda, Python Installation, Spyder, Jupyter Notebook, PyCharm, Graphviz

4. Cross Validation, Sampling, train test split in Machine Learning

5. Drop and Delete Operations in Python Pandas

6. Matrices and Vectors with python

7. Detect Outliers in Machine Learning

8. Time Series preprocessing in Machine Learning

9. Handling Missing Values in Machine Learning

10. Dummy Encoding in Machine Learning

11. Data Visualisation with Python, Seaborn, Matplotlib

12. Feature Scaling in Machine Learning

13. Python 3 basics for Beginner

14. Statistics with Python

15. Sklearn Scikit Learn Machine Learning

16. Python Pandas Dataframe Operations

17. Linear Regression, Supervised Machine Learning

18 Interiew Questions on Machine Learning and Data Science

19. Jupyter Notebook Operations
Рекомендации по теме
Комментарии
Автор

Thanks uploading, can you please upload Algorthem wise ex Decisio tree, KNN____(Your expalanation is very good)

venkataraokallagunta
Автор

Hi All,
Please note the following:
"from sklearn.cross_validation import train_test_split" is OBSOLETE now.

Please use the following to import train_test_split
from sklearn.model_selection import train_test_split

technologyCult
Автор

Sorry if I get it wrong but you dont need to use numpy or remove the labels from columns to use train_test_split?
I am doing the same thing, open a dataframe from pandas, spliting it in a x just using and y = df['column_I_need'] (I dont need to preprocessing my dataset because it has only numeric data/not NaN or Strings)
As I see in your video I am doing the same thing as you do and my results are pretty nice but I still not sure about this method because mostly people open and use Numpy to generate this x_train, x_test, y_train and y_test

joswrezende
Автор

suppose in a model, with a random_state 19, I am getting greater accuracy. So should I stick on to that random state, ie should I deploy the model with that random_state? or should my model perform well with all other random_state?

antonyjoy
Автор

what is the difference b/w random_state = 1 and random_state = 12 (Or any other number)

piyushjain
Автор

Im doing a decision tree model in python. And I set the random_state to some "fixed" number. But everytime I run the code (randome_state is fixed), I'll get different version of the model. Why is that??

edgarpanganiban
Автор

What is the use of random_state if we are going to shuffle the data with shuffle= true parameter

nishadt
Автор

Hi.. how to determine the random state for a datset

madhurjyadeka
Автор

So basically random_state must be an fixed integer not None. Am i understanding it right?

akanshmishra