Getting a random sample from your pandas data frame

preview_player
Показать описание
Working with Python's pandas library for data analytics? If your data set is very large, you might sometimes want to work with a random subset of it. The "sample" method is perfect for that. In this video, I demonstrate the ways in which you can use the "sample" method on your data frames to get back precisely the number (or fraction) of rows you want.

Рекомендации по теме
Комментарии
Автор

Very useful for my Masters data science dissertation as I'm working with tremendously large dataset, thanks a lot!!!

madhurakhaire
Автор

very helpful thank you so much, your teaching skills are fantastic and smooth

alaaeltayeb
Автор

Thanks! It was very usefull for My homework.

biglicha
Автор

mister thank you for this explanation it was very helpful but i need to ask if i have a csv file and i want to utilise exactly 1/4 of the dataset to train my model and i dont want it to be random what should i do !!! thank youu

oueslatinihel
Автор

Thank you for sharing your knowledge! Is there a way of choose randomly just one variable from an specific column?

nadjagomes
Автор

“In this world, no one teaches random sampling as clearly as you.”

AJAYVAIDSstudent
Автор

Hello there, I had few doubts related to random sample generation (having some sampling logic(10%) which covers the every unique user in the given data set), where I could assign the generated samples further to 'n' users! I know what I'm asking here is quite basics, but I couldn't find anything relatable over lot. Can you kindly help? (This is basically for generating audit sampling from a CSV file)

ImBatmanYT_CODM
Автор

How does your file location autocompletes after using ~ before courses?

kartik
Автор

Does the sample represents the actual population, i mean if I train model using sample data set will it be also correct for actual population

Is it good practice to train model on samples?

spaceadvanture
Автор

I am using the yearly data....Suppose my data is showing 33 rows and 20 columns (20 columns also including the years (1999 to 2022) in my summary stat analysis. How can I exclude the year's column from my whole analysis? OR I should delete the year's column. Please guide us further regarding any data shape command.

atifdai
Автор

Thank you for the information sir. But how to exclude variables less than or equal to zero (different kind of sample)?

avibis
Автор

Is there any way to proof that python random sampling is indeed random? From statistical perspective

l
Автор

Very helpful thank you so much, your teaching skills are fantastic and smooth

ahmadjaradat
join shbcf.ru