Data Science Project from Scratch - Part 4 (Exploratory Data Analysis)

preview_player
Показать описание
This is part 4 of the Data Science Project from Scratch Series. In this video I perform an Exploratory Data Analysis (EDA) on the data that we collected from glassdoor in part 2 and cleaned in part 3.

EDA is where we really start understanding our data. It where we start analyzing trends and begin to find insights.

First, I do some additional data cleaning. I clean up the job titles, and do some feature engineering.

Next, I look at histograms and box plots of the continuous data fields. This is an important step because we want normal data if we plan to do a regression analysis.

We then look at the categorical data. We want to see what companies, states, industries, and sectors these jobs are offered in.

After that, we make some pivot tables to better understand how average salary is different across our categorical categories.

Finally, we make a word cloud to visualize some of the most common words found in the descriptions.

Please stay tuned for part 5 where we start building models to predict salary!

#DataScience #KenJee #DataScienceProject

Partners & Affiliates

MORE DATA SCIENCE CONTENT HERE:

Check These Videos Out Next!

My Playlists
Рекомендации по теме
Комментарии
Автор

Hey Everyone! I realized that there was so much information here, that an EDA could have made for a project itself. Please let me know in the comments section if you would be interested in a video where I go through the exact job related findings from the data in this phase!

KenJee_ds
Автор

The best thing about this video was watching Ken getting stuck, trying hard to figure out how to resolve the errors and googling the seaborn code. Just made me feel that it is totally okay to not know everything. Thank you Ken :)

cooldudeutsav
Автор

18:30-Non Graphical Descriptive Analysis
20:10-Making Histogram
22:10-Making Box Plots
24:30-Correlation Analysis
28:50-Categorical Variable Analysis
Barplots based on count
34:47-Seaborn xlabel Rotations
45:00-Pivot Tables(pd.pivot_table, sort_values)

shresthaditya
Автор

Hi Ken, a small tip for 35:30, instead of rotating axis labels I usually flip coordinates. It makes all the labels visible and the reader does not have to turn his/her neck to read the label. Great work by the way, really loving the content so far!

deepakdhankani
Автор

In case you are confused with multiplying the hourly rate by 2, here what it happens under the hood: 2000 = 8hrs/day * 5 days/week * 50 weeks. The year has 52 weeks but 2 of those weeks are assigned to holidays and vacation time.

johnhillescobar
Автор

I am learning a lot from this series; real world project based learning. This is what an inexperienced person looks for. I want to see similar projects from you in the future. Also like your unedited videos showing us how you resolve coding errors. Thank you for your time and effort.

unpatel
Автор

Man, as a newbie I can't thank you enough for this playlist. Helped me wrap loose concepts altogether.

calculadorahoraextra
Автор

This playlist is very good.

Best way to learn

helligusvartproject
Автор

I loved this work, it helps to imagine and predict the daily problems of a data scientist may face and solve. Thank you highly, appreciate your work!

metinunlu_
Автор

I'm watching all the series in a night, perfect for the beginners!! Greetings from Barcelona

misterivi
Автор

I loved the compelling data storytelling during your exploratory data analysis phase. I can't wait for part five. Thank you for your effort and value. More grace Sir

bibislyvie
Автор

This is very intresting because I have been doing EDA for Internship so this helps me a lot and can't wait for next type of project series 💯

shrutijain
Автор

Hey Ken It's very interesting as I am beginner it makes me more curious to learn new concepts and I am following all your playlist

pjyothiph
Автор

This is a simple and extensive mode of EDA, really inspiring, am adopting best practices

akandetemitope
Автор

Thank you Ken! I liked that you did not edit "stuck" part ... Gives some motivation to a beginner like me. 😅

tusharbedse
Автор

great content Ken... more like a therpy for data science learners, loved the way you exlained while working through

sindhuorigins
Автор

Way to go! This was what I was looking for - the actual number jiggling...

НиколайТодоров-ит
Автор

Thank you so much for this video, your channel has by far the most helpful content on data science on YouTube and its not even close. Keep up the great work!

lucrieffel
Автор

Great stuff Ken!!! Even though I'm an R guy i still find these types of videos very valuable. It's great to see the actual thought process and methodology. Reminds me of gameplay videos lol

yousefals
Автор

wow, as a newbie, watching you do some cool stuff is so interesting :D. Especially the last part with that WorldCloud. I think it's an amazing package

sonsangsom