Resampling - p.9 Data Analysis with Python and Pandas Tutorial

preview_player
Показать описание
Welcome to another data analysis with Python and Pandas tutorial. In this tutorial, we're going to be talking about smoothing out data by removing noise. There are two main methods to do this. The most popular method used is what is called resampling, though it might take many other names. This is where we have some data that is sampled at a certain rate. For us, we have the Housing Price Index sampled at a one-month rate, but we could sample the HPI every week, every day, every minute, or more, but we could also resample at every year, every 10 years, and so on.

Another environment where resampling almost always occurs is with stock prices, for example. Stock prices are intra-second. What winds up happening though, is usually stock prices are resampled to minute data at the lowest for free data. You can buy access to live data, however. On a long-term scale, usually the data will be sampled daily, or even every 3-5 days. This is often done to keep the size of the data being transferred low. For example, over the course of, say, one year, intra-second data is usually in the multiples of gigabytes, and transferring all of that at once is unreasonable and people would be waiting minutes or hours for pages to load.

Using our current data, which is currently sampled at once a month, how might we sample it instead to once every 6 months, or 2 years? Try to think about how you might personally write a function that might perform that task, it's a fairly challenging one, but it can be done. That said, it's a fairly computationally inefficient job, but Pandas has our backs and does it very fast.

Рекомендации по теме
Комментарии
Автор

Hi Harrison,
Hope your well.
Was wondering if you could do a series on a couple of deep learning algorithms.
Best Regards
Andrew

andrewczeizler
Автор

Thank you Sir for the great tutorials..

onmoog-xycs
Автор

Hi Harrison,
Just an FYI, when I ran this today I got the following message:

FutureWarning: how in .resample() is deprecated the new syntax is .resample(...).mean()

resample('A', how=mean) worked, but gave the warning.
resample('A').mean() and resample('A').ohlc() both worked as well, but with no warning.

RichardDurham
Автор

text version of this tutorial is awesome. thanks

spicytuna
Автор

Hi Sentdex, thanks for the videos! Could you advice me on how I should go about averaging the full data for each time period? I.e averaging all the states housing prices for every month. Thanks!

samuelchia
Автор

How does it know that the date is the data is monthly? Is the date column in a date type? Because if it is just a string format, then pandas wouldn't know that is is monthly, right?

sameerzahid
Автор

problem
"FutureWarning: how in .resample() is deprecated the new syntax is .resample(...).mean()

resample('A', how=mean) worked, but gave the warning.
resample('A').mean() and resample('A').ohlc() both worked as well, but with no warning"
solution
use
data['txyr1']

sayyamahmed
Автор

Why does the opening value of year t+1 not match exactly the closing value of year t?
For instance: why is
1975-12-31 close: 6.336776 not equal to
1976-12-31 open: 6.5796775
Or am I assuming wrongly, that opening of a year actually dates back to the first of January, that year?

patrickmullan
Автор

I felt the graph displayed is not the mean, it can't be nearly all the monthly data is lower than the annually average.

yuanhu
Автор

could you please also do a pairs trading tutorial
many thanks
andrew

andrewczeizler
Автор

can you explain kalman filter gps estimation in python

asifnizamani
Автор

ok if you have a problem with "TX", replace it with "TX NSA Value"

FlottiFlotta
Автор

for some reason the label = 'TX...' isn't producing a label.

LongBoy.
Автор

i'm not able to solve this error

File "F:\ML\anaconda\lib\site-packages\pandas\core\resample.py", line 1116, in _get_resampler
"but got an instance of %r" % type(ax).__name__)

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Int64Index'

hrushikeshgargote
Автор

Hi Harrison, I tried re-sampling my data and I got an error: Only valid with DateTimeIndex, I pulled from Keen.IO and it pulls out a timeframe with a start and end date in the same coloumn. Any help you could provide would be awesome!
Thanks!

ciobolurker
Автор

how to resample data by 30 min or by 1 hour

vikramsinha
Автор

I can't get Quandl working, I did a pip install Quandl, but the import Quandl won't recognize

oping
Автор

"so this is actually *mis*leading" lmao

Dockmark
Автор

tfw the automatically generated subs promise something much more appealing. 0:02

HelliOnurb
Автор

use resample on non numeric dataset .. then we will see how much of a python programmar you are .

robinranabhat