Percent Change and Correlation Tables - p.8 Data Analysis with Python and Pandas Tutorial

preview_player
Показать описание
Welcome to Part 8 of our Data Analysis with Python and Pandas tutorial series. In this part, we're going to do some of our first manipulations on the data.

Рекомендации по теме
Комментарии
Автор

FYI the the fact that "k" represents black comes from the fact that black is the "key" value not because the "b" was taken by blue. It comes from the CMYK color model standing for cyan, yellow, magenta, and key. They could have just as easily named the model CMYB. Now you know. =)

corey-thompson
Автор

To make this work properly, I had to do this.

def grab_initial_state_data():
states = state_list()
main_df = pd.DataFrame() # creates just an empty dataframe

for abbv in states:
query = "FMAC/HPI_" + str(abbv)
df = quandl.get(query, authtoken=api_key)

df.rename(columns={'NSA Value': str(abbv) + ' NSA Value', 'SA Value': str(abbv) + ' SA Value'}, inplace=True)

col_abbv_string = str(str(abbv) + ' NSA Value') #set up string for convenience in formula below

# df = df.pct_change() #do change to percent change. This works stand alone
# formula version for pct change [0] is the original date value of 1970 or whatever
df[col_abbv_string] = (df[col_abbv_string] - df[col_abbv_string][0]) / df[col_abbv_string][0] * 100.0

if main_df.empty:
main_df = df

else:
main_df = main_df.join(df)

print(main_df.head())

joro
Автор

It seems that quandl changed all columns names to 'Value'
I had a problem with that line:
df[abbv] = (df[abbv]-df[abbv][0]) / df[abbv][0] * 100.0

it kept returning this error:
KeyError: 'AL'

I had to rename the columns in df like this:
df = df.rename(columns={'Value': abbv})

Also had to do it for 'United States'

gemartintw
Автор

FYI they changed the column name of "United States" to: "United States not seasonally adjusted" on Quandl... I don't know why.

thomasibbotson
Автор

I find the pd.index to be a useful way to quickly peak at what you are dealing with. Using df.head could get messy if there are to many columns which is usually the case.

MaksimKupfer
Автор

Similarly if you are having a problem with df = quandl.get("FMAC/HPI_USA", authtoken=api_key) just ensure you enter df.columns = ["United States"] after.

manofnocountry
Автор

IF you get problems with this code just ensure df.columns = [str(abbv)] precedes df[abbv] = (df[abbv]-df[abbv][0]) / df[abbv][0] * 100.0 . There are errors with the code in the video due to the renaming of column names by Quandl.

manofnocountry
Автор

I have a problem with the Api_Key (I already register and I have the KEY API of quandl), in the code df = quandl.get (query, authtoken = api_key) the following error appears: raise klass (message, resp.status_code, resp. text, resp.headers, code)

NotFoundError: (Status 404) (Quandl Error QECx02) You have submitted an incorrect Quandl code. Please check your Quandl codes and try again. I'm attentive to any suggestions thanks.

kevinalexanderchicaemeordo
Автор

Perfect, for Time Series data. Good job!

safaaal-wajidi
Автор

I got some problem about join data, and after million times tests I found if you have rename df by df.columns = [str(abbv)], and you could merge dataframe by :

if main_df.empty:
main_df = df
else:
main_df = pd.concat([main_df, df], axis=1)

it works for me and wish it could help others

m
Автор

When I run the same code you have, I get "FileNotFoundError: [Errno 2] No such file or directory: 'fiddy_states3.pickle'" Did I miss soemthing or doesn't pickle create the file on the fly?

alsherbin
Автор

When I add df = df.pct_change(), it still prints out the way it did originally, as a normal graph and not percent change. Any ideas on why ?

Dockmark
Автор

Hi sentdex, i have doubt here! If i 'add the percent change of all states and divide by 50' from the year '1975-02-28' it should be equal to the percentage change of USA HPI of that same year right?

OLable
Автор

Can someone explain me te logic of the this statement please?
df[abbv] = (df[abbv] - df[abbv][0] / df[abbv] * 100.0)
My understanding was that df[abbv] is the current DataFrame, for example, en entire DataFrame for 'AL' for all the years, meaning it's 498 values.
df[sbbv][0] is just the first year's value in the current DataFrame. So how does it work conceptually here that we do mathematics between 498 values and only one value? What does is actually do? it takes percentage change of all the 498 values as reference of the first value?
Thanks

kuatroka
Автор

This is so interesting man. Thank you so much.

andyhutch
Автор

What are these calculations you´re talking about that start at 100? I dind´t quite get this part and why the values converge at 100...

rodrigosilvanader
Автор

hi, i am mainly starting out and using jupyter from anaconda, will the code still work?

louisscicluna
Автор

hi, i'm getting this error: FileNotFoundError: [Errno 2] No such file or directory: 'pickle.pickle'. any clues on how to solve it? I tried pip install pickle but it gives me the same error

kmillanr
Автор

There is actually a problem when you making percentages data by doing like
"df[abbv] = (df[abbv]-df[abbv][0]) / df[abbv][0] * 100.0"
it could be minus value, which does not make sense in percentage

mozarter
Автор

I am surprised that df[abbv] worked and did not throw an error. I thought that the correct syntax is either df['abbv'] or df.abbv. I also have a data frame with 1 column and it does not allow me to reference it like Harrison did. Any thoughts?

lalu
visit shbcf.ru