Pickling - p.7 Data Analysis with Python and Pandas Tutorial

preview_player
Показать описание
Welcome to Part 7 of our Data Analysis with Python and Pandas tutorial series. In the last couple tutorials, we learned how to combine data sets. In this tutorial, we're going to resume under the premise that we're aspiring real estate moguls. We're looking to protect our wealth by having diversified wealth, and, one component to this is real-estate.

Рекомендации по теме
Комментарии
Автор

Not sure if I was doing something wrong on this one, but continually received the error:

"ValueError: columns overlap but no suffix specified: Index([u'Value'], dtype='object')"

If anyone else faces the same problem:
1) it is caused by each new df having a column name 'Value' so the tables to be joined have 'overlapping columns' --> (this appears to be the default name?)
2) fixed by adding in:
...df = Quandl.get(query, authtoken=api_key)
df.columns = [str(abbv)]
if main_df.empty: ....
to ensure each column has a unique name

Thanks for the vids sentdex!

markjam
Автор

The problem is, the database now has 2 columns, and changed the name. You cannot use the same column name because, it is not unique anymore.
The 2 columns are presented as both Not Seasonally Adjusted (NSA) and Seasonally Adjusted (SA).
Use this line:
df.rename(columns={'NSA Value':str(abbv) + ' NSA Value', 'SA Value' : str(abbv) + ' SA Value'}, inplace=True)

In one example, it will rename: NSA Value -> AK NSA Value
You will understand more in the next video.
Hope this helps

backitdev
Автор

For you guys that are getting an error like this: ValueError: Invalid header value 'zKCyRs4oPSLKxgcbvARV\n', just delete the authtoken in df = quandl.get.
Since now the database has two columns, instead of the fix in the first two comments here(df.columns = [str(abbv)]),
you are going to use: df.rename(columns={'NSA Value':str(abbv) + ' NSA Value', 'SA Value' : str(abbv) + ' SA Value'}, inplace=True)
That's it :)

CyborgGaming
Автор

for those who see ValueError: columns overlap but no suffix specified, I solved it by changing 'Value' into str(abbv). So the exact code is :
df = Quandl.get(query, authtoken=api_key)
df.rename(columns={'Value':str(abbv)}, inplace=True)
This renames the column 'Value' to each abbv string. Join will work this way

charmisuk
Автор

No concern at all! Keep them coming man! I love panda because of you!

awaraamin
Автор

for those of you who are having issues using join, you need to create a unique column names


for state in us_states[0][0][2:]:
query = "FMAC/HPI_" + str(state)

df = quandl.get(query, api_key = my_key)

# create a unique column
df.rename(columns = {'Value': state}, inplace = True)

if main_df.empty:
main_df = df
else:
main_df = main_df.join(df)

print(main_df.head())

manCoder
Автор

'wb' is not "write bytes" but "write binary". Same for 'rb'.

jope
Автор

Keep them coming! Thank you for the good job!!!

opalkabert
Автор

Finally understand pickling and benefit due to your video. Thanks for that.

charlesutton
Автор

"ValueError: columns overlap but no suffix specified: Index([u'Value'], dtype='object')" even after following the solutions that have been provided people here? use this instead

df = Quandl.get(query, authtoken=api_key)
df.rename(columns={'NSA Value':str(abbv) + 'NSA Value', 'SA Value' : str(abbv) + ' SA Value'}, inplace=True)..

This should solve your problem if it still doesn't then look at the error carefully.
ValueError: columns overlap but no suffix specified: Index(['NSA Value', 'SA Value'], dtype='object')

---- Mine mentioned NSA Value and SA Value explicitly so basiclly you just gotta have unique names for your columns that appear in your result when you fetch data. So check carefully what columns are being delievered by quandl API at the time when you are using it. simply replace that name with a unique name for every dataframe object.

pythonista
Автор

Out of the blue!! which keyboard do you use, it has a great tactile sound in the videos.

siddhantwade
Автор

Thanks a lot for your tutorial... it's just super easy to understand. thanks again for this amazing tutorial series.

robin
Автор

If you get an error regarding quandl-
sentdex merge 50 states data frame into one, but the one FMAC/HPI_MS and FMAC/HPI_WY is not currently present on quandl so you get an error.
So remove those and then run again.
Also if you get an error regarding ValueError: columns overlap but no suffix specified: then
The problem is, the database now has 2 columns and changed the name. You cannot use the same column name because it is not unique anymore.
The 2 columns are presented as both Not Seasonally Adjusted (NSA) and Seasonally Adjusted (SA).
Use this line:
df.rename(columns={'NSA Value':str(abbv), 'SA Value' : str(abbv) }, inplace=True)
it will run

ayushtibra
Автор

I think the data for some of the states do not exist anymore.
FMAC/HPI_AL
FMAC/HPI_AK
Traceback (most recent call last):

georgitanev-wb
Автор

Unfortunately, the data you are using isn't available at Quandl anymore.

szecek
Автор

Harrison, thanks for you videos I find them immensely helpful. I had a question about fiddy_states(), and in particular about tree of the fiddy states...but I forgot it. Thanks again

sqadri
Автор

As below, one of the comments suggests, over a year ago Quandl change database to two columns, one is NSA(not seasonally adjusted) and SA(seasonally adjusted), not like Sentdex's one column of "Housing price". I decided to go with SA. Just adding one more line:
df.columns = [str(abbv)]. Dataframe works perfectly like Sentdex's.

TXfoxie
Автор

We can load the def function code lines??

sensei-guide
Автор

I got only two columns by adding this in the join part:
main_df.join(df, lsuffix='left', rsuffix='right') othrwise i was getting column overlap, suffix error

Kaushal_Codes
Автор

Got the solution by myself.. Now data is changed, . actually in each db, the column name is same now. So JOIN doesn't work well.. There is a fix.. in each iteration, simply rename the column name of each db using " df.columns = [str(abbr)] "
Then use JOIN...

theycallmemorphine