Rolling statistics - p.11 Data Analysis with Python and Pandas Tutorial

Показать описание

Welcome to another data analysis with Python and Pandas tutorial series, where we become real estate moguls. In this tutorial, we're going to be covering the application of various rolling statistics to our data in our dataframes.

One of the more popular rolling statistics is the moving average. This takes a moving window of time, and calculates the average or the mean of that time period as the current value. In our case, we have monthly data. So a 10 moving average would be the current value, plus the previous 9 months of data, averaged, and there we would have a 10 moving average of our monthly data. Doing this is Pandas is incredibly fast. Pandas comes with a few pre-made rolling statistical functions, but also has one called a rolling_apply. This allows us to write our own function that accepts window data and apply any bit of logic we want that is reasonable. This means that even if Pandas doesn't officially have a function to handle what you want, they have you covered and allow you to write exactly what you need. Let's start with a basic moving average, or a rolling_mean as Pandas calls it. You can check out all of the Moving/Rolling statistics from Pandas' documentation.

Рекомендации по теме

Комментарии

Hey Harrison, thanks for the awesome tutorials!

Anyone getting deprecation warnings on:
TX_AK_12corr = pd.rolling_corr(HPI_data['TX'], HPI_data['AK'], 12)

can use this instead and it works fine:
TX_AK_12corr =

:)

joephillips

If you get the error 'pandas has no attribute - rolling mean'

Then use

HPI_data['TX12MA'] =

This syncs in with the new version

rishimalhotra

Along the lines of what +EKV alluded to below, see below for the new rolling correlation syntax and a slight augmentation to the correlation plot code so that it unstacks the index (the new syntax apparently brings back correlation matrices for each index item (i.e.date), so the index retains a new hierarchical level in the form of the correlation pair). New to all this, so please correct me if I misunderstood something. Here's the revised code based on Pandas 20.2:

fig = plt.figure()
ax1 = plt.subplot2grid((2, 1), (0, 0))
ax2 = plt.subplot2grid((2, 1), (1, 0), sharex=ax1)
HPI_data =

TX_AK_12corr = HPI_data[['TX', 'AK']].rolling(12).corr()

HPI_data['TX'].plot(ax=ax1, label="TX_HPI")
HPI_data['AK'].plot(ax=ax1, label="AK_HPI")
ax1.legend(loc=4)

TX_AK_12corr.unstack(level=1)[('TX', 'AK')].plot(ax=ax2)

plt.show()

mequals

Nice vid.

A comment about the investment "tip" you gave in the end: A correlation of -1 (or close) is not necessarily a good time to buy a house (or a stock, or any asset). It's probably is when we're talking about an upward line and a general raise in housing prices. But if, say, the market goes down - and you have one state which is still going up, a correlation of -1 will serve more of an indication that the game is over for that state and it will soon follow the rest of the pack.
You can more or less see it in the data you have here - around 85-86, the correlation between Alaska and Texas almost reaches -1, but then soon enough it goes back to 1. Was that a good time to buy? No - Alaska went back to being correlated with Texas - and they both went down.

RealMcDudu

Hey Harrison, its fantastic !!! thanks for the awesome tutorials, keep doing the good work.

suryaprasad

sentdex, are you making investments in the 'Housing Market' based on correlation between States? As you statistically analyzed from the data.

EranM

After I finish your one video and start watching another from this playlist, your number of subscribers is incrementing every time around 3 to 4 :D did you made some loop there lol :) Thanks for great videos sentdex!

ArminAlibasic

hey Harrison! awesome series - just feeling that the groupby feature got left out in the tutorials and that is something very useful for all data analysts. also on your website - could you consider adding seaborn tutorials in the data viz section? thanks and keep rockin'

asneogy

hi Sentdex, good videos, i have a question, how can i do rolling linear regression in pandas, as i understand the pd.stats.ols.MovingOLS() has been removed from version 0.20.0

ankitsoni

Thanks for the video. Would you mind doing some more videos on Monte Carlo testing?

oliverward

Had some trouble with the pd.rolling syntax, got the error message "Attribute Error: module pandas has no attribute "rolling". Turns out, in the new version of pandas (0.19.0) the syntax for rolling is:

DataFrame.rolling(window, min_periods=None, freq=None, center=False, win_type=None, on=None, axis=0),

meaning the syntax for the example in the tutorial is

ekv

up
HPI_data['TX12MA'] = pd.Series.rolling(HPI_data['TX'], window=120).mean()
HPI_data['TX12STD'] = pd.Series.rolling(HPI_data['TX'], window=12).std()

DePyton

Instead of having a gap the size of the window at the start of my graph it is at the end. Do you know why that could be?

tannerm

and having really hard time reproducing the code - even before the actual part of the rolling statistics starts. Quandl import (with capital Q) - doesn't work for python3.7. The function HPI_Benchmark() doesn't work for me. Neither does grab_initial_state_data(). Also, grab_initial_state_data() is never called, and I couldn't understand where the fiddy_states3.pickle file comes from. Is there an updated tutorial? Thanks in advance!

katyaarnold

TX_AK_12corr =
this function returns null values in latest python and pandas versions

saurabhpoojary

+sentdex Brilliant, Brilliant, Brilliant. Thanks alot

theword

does anyone know how to compare two moving averages? should i find the mean of the entire say 150 days and compare it to the mean of the 200 days? or should i take another approach?

RandomShowerThoughts

When I define ax2, ax1 is completely deleted, and I can't see anything plotted on it. So only the bottom half of the screen is taken up by ax2, any help?

Mohammed-dejm

hello sentdex would you please make tutorial about Odoo / python i would appreciate it and thank you for all the information you gave, i'm not a data analysis but i realy enjoyed using pandas, i may think to use it to manipulate tables in msql database, thank you again :)

gharbisalem

+sentdex Thanks for your awesome videos.

HPI_data['TX12STD'] =
this returns only NaN values for all the rows (not just the first 11 rows) in Python 3.7.0, pandas 0.23.3

I found a workaround from google which solved the issue but I am afraid I don't understand why or how it works. Could you please explain it? (I guess it has to do something with ddof of 0 or 1, but I have no idea)
HPI_data['TX12STD'] = x: pd.np.std(x))

aravindsivalingam

Rolling statistics - p.11 Data Analysis with Python and Pandas Tutorial

Rolling statistics - p.11 Data Analysis with Python and Pandas Tutorial

Computing rolling statistics

Python - Rolling Mean and Standard Deviation - Part 1

Quick way to navigate Pandas rolling correlations

Probability of a Dice Roll | Statistics & Math Practice | JusticeTheTutor #shorts #math #maths

Python Rolling Window Functions explained in 4 minutes

How to Calculate Simple Moving Average & Standard Deviation in python

Python Pandas || Moving Averages and Rolling Window Statistics for Stock Prices

#55 Pandas (Part 32): Rolling window function: win_type = Gaussian in Python | Tutorial

Rolling Statistics In Time Series | Stationarity Check | Machine Learning | Data Magic AI

Lightning Fast Rolling Averages : Data Science Code

Rolling Window Calculations on Excel Data - Simple Moving Average

#56 Pandas (Part 33): Rolling window function: win_type = Exponential in Python | Tutorial

#54 Pandas (Part 31): Intuition and code to calculate rolling mean and sum in Python | Tutorial

The 7 Day Rolling Average: What it is and Why it is Important!

Roll 1 Die - Intro to Descriptive Statistics

How to calculate rolling / moving average using python + NumPy / SciPy?

19 Pandas tutorial | rolling sum | rolling mean | rolling count | rolling variance | rolling corr

Statistics: Ch 4 Probability in Statistics (11 of 74) Probability of Rolling a '6'

Python - Rolling Mean and Standard Deviation - Part 2

Statistical Significance, the Null Hypothesis and P-Values Defined & Explained in One Minute

Handling Missing Data - p.10 Data Analysis with Python and Pandas Tutorial

Roll 2 Dice - Intro to Descriptive Statistics

How to Use Pandas Rolling - A Simple Illustrated Guide