Data Science Beginner Project: Kaggle House Prices Regression Analysis (Full Walkthrough)

preview_player
Показать описание
Welcome to our latest data science project! In this exciting YouTube tutorial, we'll dive into the world of advanced regression analysis using Kaggle's House Prices dataset. When working on the project, the code was able to achieve a top 10% score!

Interested in discussing a Data or AI project? Feel free to reach out via email or simply complete the contact form on my website.

🍿 WATCH NEXT

MY OTHER SOCIALS:

WHO AM I?
As a full-time data analyst/scientist at a fintech company specializing in combating fraud within underwriting and risk, I've transitioned from my background in Electrical Engineering to pursue my true passion: data. In this dynamic field, I've discovered a profound interest in leveraging data analytics to address complex challenges in the financial sector.

This YouTube channel serves as both a platform for sharing knowledge and a personal journey of continuous learning. With a commitment to growth, I aim to expand my skill set by publishing 2 to 3 new videos each week, delving into various aspects of data analytics/science and Artificial Intelligence. Join me on this exciting journey as we explore the endless possibilities of data together.

*This is an affiliate program. I may receive a small portion of the final sale at no extra cost to you.
Рекомендации по теме
Комментарии
Автор

Thanks for checking out this video.


*Both Datacamp and Stratascratch are affiliate links.

RyanAndMattDataScience
Автор

Ignoring about the bad video cropping, You are an awesome dude!

satvik
Автор

Hey guys I hope you enjoyed the video! If you did please subscribe to the channel!

I do plan on updating it + adding more notes/comments to it.


Up next I'm working on a Python Classes course and the start of a series on Deep Learning!

RyanAndMattDataScience
Автор

If is relevant at all I would recommend that if you are zooming in the screen then move the zoom towards the same position you are reading or talking about, often in the video the zoom wasn't relevant

TheErick_
Автор

You are the best teacher. Keep it up, once I started Kaggle but have not made any competition..But this seems to encourage to consider that.

kwizeralambert
Автор

a lot of effort put in this video. thanks! in future videos make sure to keep whole of your screen in the video

ilyosjonnishanov
Автор

I just finished it. Dope... Thanks so much.

elfincredible
Автор

This is fantatsic and Ive subscribed to your channel. Im only new to this but people like you who spend their time creating videos like this are commendable. I hope to give back like this one day. Also, you mentioned someone on Kaggle that you got some tips from. Who was that? Im fascinated to know who has more knowledge than someone like you that has heaps

mgrahamization
Автор

Ok, dude... I haven't even watched the video yet. I'm just here to say that on my way home from work today I was thinking about doing this EXACT project and I completely forgot about. All of a sudden your video pops up on my feed... Yo, Data science out hear reading minds!

mattadata
Автор

The video is great, you earned a new subscriber.
Maybe I didn't focused or my understanding is little, Can you please write in short, why you did box plot for the categorical columns? Because it looked like you are only filling values with 'no' and '0'. Thank you

miftahuladib-nv
Автор

thanks a lot for this project, learnt great knowledge (especially about stacked regressors and voting regressors) and how to filter out outliers and fill na values using description.txt and a little worldly experience/knowledge.

AdityaSharma-fx
Автор

First off, I want to say great video this really helped me in getting down a good workflow for kaggle cometitions.
But also, doesn't doing train_test_split after all the preprocessing is done cause a risk of data leakage?

I recently finished the intermediate machine learning course on Kaggle and one of the section really emphasized that unless you're passing your pipeline and model into cross validation, preprocessing should always be done AFTER train_test_split.

To my understanding, by preprocessing first and then splitting, this means that our model is being trained on scaled, imputed, and encoded values and the validation data is also preprocessed.

So the model will perform well during validation, but when it is exposed to the test_data which does not have scaled or imputer values, it will perform poorly.

Am I missing something here?
I'm only 2 hours and 15 minutes into the video so if he addresses this later, sorry!

krssovr
Автор

Thank you for creating this video. Can you expand more on why you did not include both Lasso and ElasticNet at the 2:25:10 mark? I'm curious if it made the Stacking Regressor worse at the very end in your original notebook.

mattysmirks
Автор

Shouldn't we use IQR/boxplot to check for outliers? Is outliers referring to outliers of the distribution or outliers of the relationship?

lyk
Автор

Your videos are great. I just love this channel. It's just that kndly try to focus the recording on the code when you are typing. 🙂

shivamsapru
Автор

i knew u looked familiar and then saw the vintage cards in the back Lol, im subscribed to ur card channel too

vancouverrrr
Автор

Hey Nolan Do you have separate tutorials for every machine learning model you used in this tutorial?

s.s.sdhyuthidhar
Автор

I'm a little bit over an hour in and good video so far! I think you could have saved a lot of time doing many things programmatically so far though.

richardweston
Автор

I'm trying to build something similar but instead of prediction they have asked me to explain house price-

A data science model that explains how different factors(gpd, unemployment, interest rate etc ) impacted home prices over the last 20 years.

Any suggestions on what type of model should I use for this problem

OrangeTomato
Автор

With some tuning i got 0.018
Can you make more such competition videos cause i love it.

itsmephougat