Solving Real-World Data Analysis Tasks with Python Pandas & Dataiku DSS (Movie Analysis)

preview_player
Показать описание
In this video we walk through a series of real-world data analysis tasks using a Netflix movie & TV show dataset. We start by solving the tasks using the Python Pandas library. We then complete the same problems using the Dataiku Data Science Studio.

Being knowledgeable about various tools in the data science space is very important to becoming a senior team member & making management level decisions. Different problems & team dynamics call for different solutions. Seeing a wide range of technology can help you to make educated decisions and level up your overall team impact.

Panda Skills worked on in this video:
- General Python Pandas Knowledge
- Using groupby method and aggregating values
- Sorting columns by value (ascending & descending)
- Converting columns to datetime, parsing dates
- Strategically iterating through dataframes and counting values

Dataiku skills worked on in this video:
- Dataiku DSS introductory & intermediate knowledge
- High level column & dataset analysis
- Dataiku processing Steps such as Prepare, Sample & Filter, and Groupby
- Dataiku Split & Fold method
- Parsing dates to extract year & month

—---------------------
Video Timeline!
0:00 - Introduction & Video Overview
1:18 - Getting started with the Data & Code
4:22 - Task #1 (Python): What is the most popular release year for movies on Netflix?
9:42 - Task #2 (Python): What year did Netflix add the most content to its platform?
16:18 - Task #3 (Python): What is the most popular month to add new content?
20:10 - Task #4 (Python): What is the movie with the longest title in the dataset?
23:54 - Task #5 (Python): Which actor/actress appeared in the most movies & tv shows?
35:48 - Getting started with Dataiku DSS!
38:05 - Task #1 (Dataiku DSS): Most popular release year for movies on Netflix
41:00 - Task #2 & #3 (Dataiku DSS): What was the most popular year & month to add content on Netflix?
44:31 - Task #4 (Dataiku DSS): What is the longest movie title in the dataset?
47:10 - Task #5 (Dataiku DSS): Which actor/actress appeared in the most Netflix movies & tv shows?
56:40 - Video Recap & Conclusion

Free Dataiku Learning Resource:

From LEARN Media
Рекомендации по теме
Комментарии
Автор

Shout out to Dataiku for having me on the channel! Hopefully you all find this lesson informative. Let me know if you have any questions 😀.

KeithGalli
Автор

About 1, 5 year ago I've found your video about solving real world data analysis with Pandas. Ever since I use Python/Pandas daily and landed a job as data analyst. I wanted to thank you for all the content. It's really liberating to not have to use Excel as often anymore :). Welcome back and I hope you are feeling better.

putyah
Автор

Amazing lesson, easy to follow how u came up with solution of each tasks. Also great demonstration of dataiku tool. This is much easier way of analyzing data without programming

danmold
Автор

Awesome lesson, learnt a lot from your previous videos too.
for Question 5, this 2 line code also will do the trick

artists_list = df['cast'].str.split(', ').explode()

skamalu
Автор

Thank you Keith, informative as usual.👌

lamya
Автор

Hey Superhero🦋❣️I love your way of explaination 😼.... Love from India ...My name is the 5th problem you write all those logic✨it is intresting ...but I the shortest way to to solve the problem....I want to share with you probably you know that ....


from collections import Counter
cast_counter = Counter()
for cast in movie_df["cast"]:
cast_counter.update(cast.split(", "))
Top_15 = cast_counter.most_common(15)


Output return the list of tuples...having the cast and the number of movies that tha cast acted..


Thankyou ❣️

anantharjun
Автор

Thanks for the video. I got a question though, ***spoilers***



can't we just use the value_counts() instead of this quite complicated group by expression?

lex
Автор

If with dataiku i can analyze/vis data and do ML /DL and déploy it….did I learn pandas numpy docker airflow scikit learn for nothing?

julien
Автор

hello sir, can you please do a tutorial on how to append datasets in a single excel file using dataiku?
Thanks in advance🙏

shrivatsaupadhyaya
Автор

Can you tell me what's the check for if else condition in Dataiku for equal and else conditions

pixiestar_
Автор

For the first problem you could have simply done .value_counts() for a one line solution

vinny