Generating Mock Data with Python! (NumPy, Pandas, & Datetime Libraries)

preview_player
Показать описание
In this video we write a python script to automatically generate a sales dataset. To do this we use the NumPy, Pandas, Calendar, & Datetime libraries. This is ultimately the data that we used in my last video “Solving real world data science problems with python pandas”.

Link to the last video:

Link to finished code on GitHub:

Useful resources!

Detailed video description!
We start by creating a simple dataframe and programmatically adding rows of product purchases to it. We use the random library to select these products.

We make our data more realistic by utilizing normal distributions and geometric distributions in numpy to spread out the number of purchases we make and the quantity of each item purchased.

We use the datetime library to allow us to generate thousands of different times for each purchase with the most common times peaking around 12pm and 8pm.

We take a list of the most common US street addresses to help us randomly generate addresses for each purchases.

Hope you guys enjoy! Make sure to subscribe if you haven’t already :)

Practice your Python Pandas data science skills with problems on StrataScratch!

Join the Python Army to get access to perks!

---------------------------------------------

Follow me on social media!

---------------------------------------------

Today’s merch!
Creator: @Chris Chann

---------------------------------------------

Video Timeline!
0:00 - Intro & Background Info
1:15 - What we're creating in this video!
2:03 - Start writing code (generating a simple dataframe & csv)
8:26 - Task: Making our data more realistic, selecting some products with higher probability than others
14:15 - Task: Generate 12 months worth of data in 12 csvs (calendar library, f-strings)
18:12 - Make some months have more purchases than others
19:28 - Normal distributions in NumPy
23:43 - Improving speed of our code (making testing easier)
26:41 - Task: Generate random addresses for our data
35:03 - Task: Generate order times for purchases (datetime library overview)
40:02 - Using timedelta objects to add & subtract time from dates
45:09 - Generate a realistic quantity ordered for each product (using numpy geometric distribution)
49:38 - Add multiple items being more likely to be sold together and cleaning code a bit

*I use affiliate links on the products that I recommend. I may earn a purchase commission or a referral bonus from the usage of these links.
Рекомендации по теме
Комментарии
Автор

Video Timeline!
0:00 - Intro & Background Info
1:15 - What we're creating in this video!
2:03 - Start writing code (generating a simple dataframe & csv)
8:26 - Task: Making our data more realistic, selecting some products with higher probability than others
14:15 - Task: Generate 12 months worth of data in 12 csvs (calendar library, f-strings)
18:12 - Make some months have more purchases than others
19:28 - Normal distributions in NumPy
23:43 - Improving speed of our code (making testing easier)
26:41 - Task: Generate random addresses for our data
35:03 - Task: Generate order times for purchases (datetime library overview)
40:02 - Using timedelta objects to add & subtract time from dates
45:09 - Generate a realistic quantity ordered for each product (using numpy geometric distribution)
49:38 - Add multiple items being more likely to be sold together and cleaning code a bit

Wanted to get this video out before the month ended! (shooting for 2 a month). Hope you enjoyed :)

KeithGalli
Автор

This is by far the best python data analysis tutorial I have seen on Youtube. Thank you so much Keith. Appreciate it.

thatgothulk
Автор

Hey Keith, thanks for this video! Stats person (R-coding) here, switching to Python, I am currently planning a complex ML - simulation study as part of my PhD and this video is all I needed out of the whole internet. I am so glad I found your channel, you really helped with my impostor! Big thank you! <3

adt
Автор

Still on the journey of going through your entire data science playlist. Eager what I am about to learn in this video.

misterjava
Автор

Hey man. I appreciate your videos. They really help me understand python language and how to apply it to data analysis. ✌️

viq
Автор

Best follow-along tutorials on Python Libraries ! Thanks Keith !

samirz
Автор

Really great & detailed tutorial 👏🏻👏🏻🙏🙏 thanks.
I searched lot of online solutions but none of this fulfilled my requirements, this is great I can build my own generator. Also I also learned some python 🐍 thanks 🙏 😊

IamSHVA
Автор

Appreciate your efforts, Your knowledge helped me in generating mock data.

ssbigdata
Автор

Hey Keith, why do you place [0] after random.choices() to make it a list? This is around 14:00 minutes in

CountLife
Автор

Exactly what I was looking for, thank you.

romuloyloy
Автор

Hey Kieth, can you make a dedicated vedio on Time Series Analysis please !

alokshukla
Автор

Dude, I love your videos. You are awesome! and so flexible. I mean you know how to move on this area like if you don't have something you know where to look. As a lawyer how knows where is the law, but not wasting his time as memory, instead, know how to get to the main keys by good indexing. Sorry if it wasn't the best example but well. Thanks for this videos. I really learn a for not saying a about this tools. Thanks you thansk you thanks you ♥♥♥

LNMLucasMasiero
Автор

Briliiant, especially when you throw some real statistics into it (random, distributions, etc.). will be very helpful in building simulations for model's testing. thanks!

yairtsur
Автор

How would you recommend generating mock data with categorical columns and date time stamp.

anupambanerjee
Автор

displays error "Cannot choose from an empty sequence" and 'dict_keys' object is not subscriptable. anyone help me please i realy need this.

habeshavideos
Автор

Faker is a pretty hefty library for fake data generation, including region localization

Vbnklabj
Автор

Hi Keith
I have a doubt regarding extracting. Time value( example like 6.25.37 ) from Excel sheet in python.
How to get proper output?

harikrishna
Автор

Just here to give my attendance! I'll watch this very soon, in the weekend. However, can you make videos on advance data science projects? Like advanced projects with in depth explanation!

SMFahim-vozn
Автор

hey man, appreciate your videos. However, could you do a video how to scrap data from web pages? I had to scrap data from Russian web page and I could not. thanks again

galymzhankenesbekov
Автор

Hi Keith, great video, I'm learning to be a Python developer, so for practice, I've taken your base code and refactored it. Is there any way I could send you the code and get your honest feedback/critique on it? I'm finding it very difficult to get a job and would appreciate any feedback. Thanks.

cordularaecke
visit shbcf.ru