Percent Change and Correlation Tables - p.8 Data Analysis with Python and Pandas Tutorial

Показать описание

Welcome to Part 8 of our Data Analysis with Python and Pandas tutorial series. In this part, we're going to do some of our first manipulations on the data.

Рекомендации по теме

Комментарии

FYI the the fact that "k" represents black comes from the fact that black is the "key" value not because the "b" was taken by blue. It comes from the CMYK color model standing for cyan, yellow, magenta, and key. They could have just as easily named the model CMYB. Now you know. =)

corey-thompson

To make this work properly, I had to do this.

def grab_initial_state_data():
states = state_list()
main_df = pd.DataFrame() # creates just an empty dataframe

for abbv in states:
query = "FMAC/HPI_" + str(abbv)
df = quandl.get(query, authtoken=api_key)

df.rename(columns={'NSA Value': str(abbv) + ' NSA Value', 'SA Value': str(abbv) + ' SA Value'}, inplace=True)

col_abbv_string = str(str(abbv) + ' NSA Value') #set up string for convenience in formula below

# df = df.pct_change() #do change to percent change. This works stand alone
# formula version for pct change [0] is the original date value of 1970 or whatever
df[col_abbv_string] = (df[col_abbv_string] - df[col_abbv_string][0]) / df[col_abbv_string][0] * 100.0

if main_df.empty:
main_df = df

else:
main_df = main_df.join(df)

print(main_df.head())

joro

It seems that quandl changed all columns names to 'Value'
I had a problem with that line:
df[abbv] = (df[abbv]-df[abbv][0]) / df[abbv][0] * 100.0

it kept returning this error:
KeyError: 'AL'

I had to rename the columns in df like this:
df = df.rename(columns={'Value': abbv})

Also had to do it for 'United States'

gemartintw

FYI they changed the column name of "United States" to: "United States not seasonally adjusted" on Quandl... I don't know why.

thomasibbotson

I find the pd.index to be a useful way to quickly peak at what you are dealing with. Using df.head could get messy if there are to many columns which is usually the case.

MaksimKupfer

Similarly if you are having a problem with df = quandl.get("FMAC/HPI_USA", authtoken=api_key) just ensure you enter df.columns = ["United States"] after.

manofnocountry

IF you get problems with this code just ensure df.columns = [str(abbv)] precedes df[abbv] = (df[abbv]-df[abbv][0]) / df[abbv][0] * 100.0 . There are errors with the code in the video due to the renaming of column names by Quandl.

manofnocountry

I have a problem with the Api_Key (I already register and I have the KEY API of quandl), in the code df = quandl.get (query, authtoken = api_key) the following error appears: raise klass (message, resp.status_code, resp. text, resp.headers, code)

NotFoundError: (Status 404) (Quandl Error QECx02) You have submitted an incorrect Quandl code. Please check your Quandl codes and try again. I'm attentive to any suggestions thanks.

kevinalexanderchicaemeordo

Perfect, for Time Series data. Good job!

safaaal-wajidi

I got some problem about join data, and after million times tests I found if you have rename df by df.columns = [str(abbv)], and you could merge dataframe by :

if main_df.empty:
main_df = df
else:
main_df = pd.concat([main_df, df], axis=1)

it works for me and wish it could help others

m

When I run the same code you have, I get "FileNotFoundError: [Errno 2] No such file or directory: 'fiddy_states3.pickle'" Did I miss soemthing or doesn't pickle create the file on the fly?

alsherbin

When I add df = df.pct_change(), it still prints out the way it did originally, as a normal graph and not percent change. Any ideas on why ?

Dockmark

Hi sentdex, i have doubt here! If i 'add the percent change of all states and divide by 50' from the year '1975-02-28' it should be equal to the percentage change of USA HPI of that same year right?

OLable

Can someone explain me te logic of the this statement please?
df[abbv] = (df[abbv] - df[abbv][0] / df[abbv] * 100.0)
My understanding was that df[abbv] is the current DataFrame, for example, en entire DataFrame for 'AL' for all the years, meaning it's 498 values.
df[sbbv][0] is just the first year's value in the current DataFrame. So how does it work conceptually here that we do mathematics between 498 values and only one value? What does is actually do? it takes percentage change of all the 498 values as reference of the first value?
Thanks

kuatroka

This is so interesting man. Thank you so much.

andyhutch

What are these calculations you´re talking about that start at 100? I dind´t quite get this part and why the values converge at 100...

rodrigosilvanader

hi, i am mainly starting out and using jupyter from anaconda, will the code still work?

louisscicluna

hi, i'm getting this error: FileNotFoundError: [Errno 2] No such file or directory: 'pickle.pickle'. any clues on how to solve it? I tried pip install pickle but it gives me the same error

kmillanr

There is actually a problem when you making percentages data by doing like
"df[abbv] = (df[abbv]-df[abbv][0]) / df[abbv][0] * 100.0"
it could be minus value, which does not make sense in percentage

mozarter

I am surprised that df[abbv] worked and did not throw an error. I thought that the correct syntax is either df['abbv'] or df.abbv. I also have a data frame with 1 column and it does not allow me to reference it like Harrison did. Any thoughts?

lalu

Percent Change and Correlation Tables - p.8 Data Analysis with Python and Pandas Tutorial

Percent Change and Correlation Tables - p.8 Data Analysis with Python and Pandas Tutorial

Python Tutorial: Learn Statistical in Pandas-Percentage Change, Covariance, Correlation in 7 Minutes

#51 Pandas (Part 28) Percent change, Covariance, Correlation in Python | Tutorial

How to Interpret a Correlation Matrix

How to Figure Out the Percentage of Increase Between Two Differences

How to add asterisks to a correlation table to show the significance in excel

How to make a report-ready correlation matrix quickly using SPSS and Excel

Python: Percent change calculation how to tutorial

Percent Change Method Explanation in Pandas for Python: .pct_change() and All Parameters

Calculating Percentage Change in Right way, Mean, Median, Mode in Excel By T&T

Interpreting percentages from a cross tabulation table in SPSS

AEM 3100 Percent Change Forecasting

Using Excel to calculate a correlation coefficient || interpret relationship between variables

How to calculate the frequencies and precentages in SPSS

Correlation Matrix (Numerical) | Feature Selection | Python

What is a Correlation Matrix - Business Statistics Tips

Perform Basic Correlation Analysis in Excel

Draw Correlation Plot in Origin Pro | Correlation Coefficient | Very Easy

How To... Calculate Pearson's Correlation Coefficient (r) by Hand

How to Calculate Percent Change (Growth Rate) in Pandas (Pct_Change) in Python

#48. Descriptive Statistics - 7: Rank, percent, cumulative sum, product, min, max | Tutorial

Calculate Rate of Change in Microsoft Excel (Shortcut) | Excel Nerds

Finding Percent Change in Data on a Line Graph

Significance Testing Contingency Tables and Correlations