Logistic Regression for Classification | Working with a real-world dataset from Kaggle

preview_player
Показать описание


🎯 Topics Covered
• Downloading a real-world dataset from Kaggle
• Splitting a dataset into training, validation & test sets
• Imputing and scaling numeric features
• Encoding categorical columns as one-hot vectors
• Training a logistic regression model using Scikit-learn
• Evaluating a model using a validation set and test set

classification/17915

⌚ Time Stamps:
00:00 Introduction
05:16 Problem Statement
25:43 Downloading a real-world dataset from Kaggle
35:35 Exploring data analysis and visualization
47:06 Splitting a dataset into training, validation & test sets
01:03:04 Filling/Imputing missing values in numeric columns
01:21:55 Scaling numeric features to a(0,1) range
01:28:10 Encoding categorical columns as one-hot vectors
01:39:02 Training a logistic regression model using Scikit-learn
01:53:41 Evaluating a model using a validation set and test set
02:19:38 Saving a model to disk and loading it back
02:36:28 Summary and Conclusion

⚡ Free Certification Course

🎤 About the speaker
Aakash N S is the co-founder and CEO of Jovian - a community learning platform for data science & ML. Previously, Aakash has worked as a software engineer (APIs & Data Platforms) at Twitter in Ireland & San Francisco and graduated from the Indian Institute of Technology, Bombay. He’s also an avid blogger, open-source contributor, and online educator.

#GBM #MachineLearning #Python #Certification #Course #Jovian

-
Рекомендации по теме
Комментарии
Автор

That was intense!!!
This is probably the first time I have watched a tutorial this long without any break
You are Awesome sir

anuragthakur
Автор

Thanks a lot Aakash for the fabulous explanations and infectious passion to empower others. These tutorials are simply unmatched! Bravo!

kizzavincent
Автор

Nicely explained Akash and Jovian Team..this was probably the most thorough and clearly explained tutorial I came across

TheAnugupta
Автор

This video is still one of the best. A literal game changer!

SillyLittleMe
Автор

Great video! I learned a lot! Thank you!

parastooaghr
Автор

great explanation with reasonable depth for this topic, such a great video...

hemangdhanani
Автор

Great content Aakash sir, that too free...really amazed and impressed by jovian !

ektakumari
Автор

Thank you, this was very beginner friendly and it helped me understand a lot of practical topics.

sahilmalhotra
Автор

Nice Video....Really appreciated. Can we also include the topic of setting up data pre processing pipelines in future sessions.

gurjeet
Автор

Thank you for such a detailed lecture. Very very helpful. Would love to know about more.

tapomayeebasu
Автор

Very good tutorial.elaborate and detailed .thanks

foodforthought
Автор

hey, also isn't it a common practice to scale the test data that is transform the test data or validation data by fitting it only on training datasets?

anuphp
Автор

I was working on a mini data science project in which test.csv and train.csv datasets given to me. I trained my model using training data. Now if i want to find accuracy score of my model on testing data what i will do? If i write model.predict(test_data) then how i will compare the predicted tesing values to the true values? Because there is no target values in the testing dataset

UsmanKhan-tcsk
Автор

(1:53:40) when you plot the weights the negative weight would not be considered.

And the negative weights also affect the model just in opposite direction.

What are your thoughts should the negative weights be considered??

siddharthsahu
Автор

I have a doubt. When we do imputation, we take mean to replace the missing values. We take the mean from each columns of the entire data.

The mean of data in each columns of the entire data should be different from means taken from train_df, val_df and test_df separately. It will create some discrepancy in the final result. What's your position on this ? Whether we should conduct imputation based on the entire dataframe or from its subsets

georgevavolil
Автор

Hello. I have a question. Should we scale the features after the imputation or before because here you imputed the raw_df dataframe which is not imputed? Thanks

mayankraj
Автор

would you mind switching to dark mode?
TIA

asifsaad
Автор

Thanks a Lot Bro its nice dataset and you covered very nice from start to end

dataninjaa
Автор

1:45:00 whilst you fitted the transformed cols in to your model, I am still getting a type error

thakurprathiksinghrajput
Автор

thnks sir...but how to deploy on the website?

datahistory