Rolling statistics - p.11 Data Analysis with Python and Pandas Tutorial

preview_player
Показать описание
Welcome to another data analysis with Python and Pandas tutorial series, where we become real estate moguls. In this tutorial, we're going to be covering the application of various rolling statistics to our data in our dataframes.

One of the more popular rolling statistics is the moving average. This takes a moving window of time, and calculates the average or the mean of that time period as the current value. In our case, we have monthly data. So a 10 moving average would be the current value, plus the previous 9 months of data, averaged, and there we would have a 10 moving average of our monthly data. Doing this is Pandas is incredibly fast. Pandas comes with a few pre-made rolling statistical functions, but also has one called a rolling_apply. This allows us to write our own function that accepts window data and apply any bit of logic we want that is reasonable. This means that even if Pandas doesn't officially have a function to handle what you want, they have you covered and allow you to write exactly what you need. Let's start with a basic moving average, or a rolling_mean as Pandas calls it. You can check out all of the Moving/Rolling statistics from Pandas' documentation.

Рекомендации по теме
Комментарии
Автор

Hey Harrison, thanks for the awesome tutorials!

Anyone getting deprecation warnings on:
TX_AK_12corr = pd.rolling_corr(HPI_data['TX'], HPI_data['AK'], 12)

can use this instead and it works fine:
TX_AK_12corr =

:)

joephillips
Автор

If you get the error 'pandas has no attribute - rolling mean'

Then use

HPI_data['TX12MA'] =

This syncs in with the new version

rishimalhotra
Автор

Along the lines of what +EKV alluded to below, see below for the new rolling correlation syntax and a slight augmentation to the correlation plot code so that it unstacks the index (the new syntax apparently brings back correlation matrices for each index item (i.e.date), so the index retains a new hierarchical level in the form of the correlation pair). New to all this, so please correct me if I misunderstood something. Here's the revised code based on Pandas 20.2:

fig = plt.figure()
ax1 = plt.subplot2grid((2, 1), (0, 0))
ax2 = plt.subplot2grid((2, 1), (1, 0), sharex=ax1)
HPI_data =

TX_AK_12corr = HPI_data[['TX', 'AK']].rolling(12).corr()

HPI_data['TX'].plot(ax=ax1, label="TX_HPI")
HPI_data['AK'].plot(ax=ax1, label="AK_HPI")
ax1.legend(loc=4)

TX_AK_12corr.unstack(level=1)[('TX', 'AK')].plot(ax=ax2)

plt.show()

mequals
Автор

Nice vid.

A comment about the investment "tip" you gave in the end: A correlation of -1 (or close) is not necessarily a good time to buy a house (or a stock, or any asset). It's probably is when we're talking about an upward line and a general raise in housing prices. But if, say, the market goes down - and you have one state which is still going up, a correlation of -1 will serve more of an indication that the game is over for that state and it will soon follow the rest of the pack.
You can more or less see it in the data you have here - around 85-86, the correlation between Alaska and Texas almost reaches -1, but then soon enough it goes back to 1. Was that a good time to buy? No - Alaska went back to being correlated with Texas - and they both went down.

RealMcDudu
Автор

Hey Harrison, its fantastic !!! thanks for the awesome tutorials, keep doing the good work.

suryaprasad
Автор

sentdex, are you making investments in the 'Housing Market' based on correlation between States? As you statistically analyzed from the data.

EranM
Автор

After I finish your one video and start watching another from this playlist, your number of subscribers is incrementing every time around 3 to 4 :D did you made some loop there lol :) Thanks for great videos sentdex!

ArminAlibasic
Автор

hey Harrison! awesome series - just feeling that the groupby feature got left out in the tutorials and that is something very useful for all data analysts. also on your website - could you consider adding seaborn tutorials in the data viz section? thanks and keep rockin'

asneogy
Автор

hi Sentdex, good videos, i have a question, how can i do rolling linear regression in pandas, as i understand the pd.stats.ols.MovingOLS() has been removed from version 0.20.0

ankitsoni
Автор

Thanks for the video. Would you mind doing some more videos on Monte Carlo testing?

oliverward
Автор

Had some trouble with the pd.rolling syntax, got the error message "Attribute Error: module pandas has no attribute "rolling". Turns out, in the new version of pandas (0.19.0) the syntax for rolling is:

DataFrame.rolling(window, min_periods=None, freq=None, center=False, win_type=None, on=None, axis=0),

meaning the syntax for the example in the tutorial is



ekv
Автор

up
HPI_data['TX12MA'] = pd.Series.rolling(HPI_data['TX'], window=120).mean()
HPI_data['TX12STD'] = pd.Series.rolling(HPI_data['TX'], window=12).std()

DePyton
Автор

Instead of having a gap the size of the window at the start of my graph it is at the end. Do you know why that could be?

tannerm
Автор

and having really hard time reproducing the code - even before the actual part of the rolling statistics starts. Quandl import (with capital Q) - doesn't work for python3.7. The function HPI_Benchmark() doesn't work for me. Neither does grab_initial_state_data(). Also, grab_initial_state_data() is never called, and I couldn't understand where the fiddy_states3.pickle file comes from. Is there an updated tutorial? Thanks in advance!

katyaarnold
Автор

TX_AK_12corr =
this function returns null values in latest python and pandas versions

saurabhpoojary
Автор

+sentdex Brilliant, Brilliant, Brilliant. Thanks alot

theword
Автор

does anyone know how to compare two moving averages? should i find the mean of the entire say 150 days and compare it to the mean of the 200 days? or should i take another approach?

RandomShowerThoughts
Автор

When I define ax2, ax1 is completely deleted, and I can't see anything plotted on it. So only the bottom half of the screen is taken up by ax2, any help?

Mohammed-dejm
Автор

hello sentdex would you please make tutorial about Odoo / python i would appreciate it and thank you for all the information you gave, i'm not a data analysis but i realy enjoyed using pandas, i may think to use it to manipulate tables in msql database, thank you again :)

gharbisalem
Автор

+sentdex Thanks for your awesome videos.

HPI_data['TX12STD'] =
this returns only NaN values for all the rows (not just the first 11 rows) in Python 3.7.0, pandas 0.23.3

I found a workaround from google which solved the issue but I am afraid I don't understand why or how it works. Could you please explain it? (I guess it has to do something with ddof of 0 or 1, but I have no idea)
HPI_data['TX12STD'] = x: pd.np.std(x))

aravindsivalingam
join shbcf.ru