IO Basics - p.3 Data Analysis with Python and Pandas Tutorial

preview_player
Показать описание
Welcome to Part 3 of Data Analysis with Pandas and Python. In this tutorial, we will begin discussing IO, or input/output, with Pandas, and begin with a realistic use-case. To get ample practice, a very useful website is Quandl. Quandl contains a plethora of free and paid data sources. What makes this location great is that the data is generally normalized, it's all in one place, and extracting the data is the same method. If you are using Python, and you access the Quandl data via their simple module, then the data is automatically returned to a dataframe. For the purposes of this tutorial, we're going to just manually download a CSV file instead, for learning purposes, since not every data source you find is going to have a nice and neat module for extracting the datasets.

Let's say we're interested in maybe purchasing or selling a home in Austin, Texas. The zipcode there is 77006. We could go to the local housing listings and see what the current prices are, but this doesn't really give us any real historical information, so let's just try to get some data on this. Let's query for "home value index 77006." Sure enough, we can see an index here. There's top, middle, lower tier, three bedroom, and so on. Let's say, sure, we got a a three bedroom house. Let's check that out. Turns out Quandl already provides graphs, but let's grab the dataset anyway, make our own graph, and maybe do some other analysis. Go to download, and choose CSV. Pandas is capable of IO with csv, excel data, hdf, sql, json, msgpack, html, gbq, stata, clipboard, and pickle data, and the list continues to grow. Check out the IO Tools documentation for the current list. Take that CSV and move it into the local directory (the directory that you are currently working in / where this .py script is).

Рекомендации по теме
Комментарии
Автор

After discovering pandas can import JSON I paused your video, installed it, and watched it do - it 5 minutes - what took me hours and hours to accomplish. You've earned a loyal fan! Thanks so much for this.

I feel foolish for overlooking pandas.

Achooification
Автор

When i start watching one of them i cant stop watching the rest! Your way of teaching is unique man! Thanks a lot

awaraamin
Автор

Thanks for the tutorial. Btw your keyboard noise is mesmerizing and i mean it

maximfrewdell
Автор

Hi Harrison,

One question if I may. If I put: df.columns = ['House_Prices'] I get an error (IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices).

But, when I put: df.rename(columns={'Value': 'Housing_price'}, inplace=True) the program runs. Can you comment on that and/or update it on pythonprogramming.net? Thanks.

antonijarimac
Автор

Your tutorials are great.
Watching them in x3 speed is PERFECT

This is how you do it
1) Ctrl+Shift+J
2) [0].playbackRate = 3
3) Enter

tenderwobi
Автор

Some of the best tutorials on the interwebs, thank you sir.

Locke
Автор

Great work. Using this to supplement some in class teaching, and finding this more useful so far. thanks for all the help.

louispowell
Автор

Hello,
df = pd.read_csv('newcsv2.csv') is removing my header row & my index. In the tutorial you only cover how to get the index back. Any idea what's happening here?

neilturner
Автор

Thanks a lot sentdex! I am reading "Python for data analysis" and this is a great introduction to pandas that will make my study of the book much easier!

OttoFazzl
Автор

Sir, I have a data set that has 'Date', 'Good Job', 'Poor Job' etc. in the form of a csv file. When I use the pd.readcsv instruction and I specify index_col =0, it takes the first coulmn of my data that is 'Date' as the starting column. Henceforth, whenever I try to convert the data to a numpy array, it throws me a list of errors. However, I tried to create the numpy array, without using the index_col=0, and it works. Can you please explain the reason for this to me? Thank you

rajdeepchatterjee
Автор

This is 2018 and Quandl requires you to create an account to download data !.. Just an Info

saidharshanshan
Автор

plz do write all URL's you use like that of the data set at 2:51... it's difficult to find between many other datasets because they are updating their's...

adarshtadwai
Автор

aw yeee, ive gotten all of the Quandl and eoodata data already downloaded dating a few years back. excited that im going to be able to put it to use now. now i just have to figure out how to deal with missing data and date formatting that isn't consistent.

prophetting
Автор

When you dragged the zillo data (from Quandl) at about 17:11 where did you drag it to?

edwardhouser
Автор

Awesome work. It helping me out a lot to make life easier with python. Thanks alot

narayana
Автор

+Sentdex Thank you so much, "You are a BEAST"! Your tutorials are dead simple. I have been studying for Microsoft Business Intelligence MSBI. However, I just lost interest in it. I just felt I could do something more meaningful using Python. My aim is to enter the field of Data Analysis. My background is mostly computer networking but the field seem saturated so I want to switch careers and I just wanted to go back to my programming root because I love python so much. Thank God I found you, You are indeed a Life Saver. I would like to stay in close communication with you so you could advise me on what to learn to become proficient in Data Analysis using Python. Thank you so much. Wow you are truly a Genius!

theword
Автор

Thanks for these videos! I'm learning a lot and it's very helpful.

tmemo
Автор

Thank you for this great tutorial. Great help. Can you please have another video of reading big csv files using pandas which do not fit system's RAM and give memory errors. I think that one area where pandas kind of fails. Any tutorial on reading huge csv files and saving pandas data frames to pickle or hdf would be of great help.

GauravKumar-igiy
Автор

can someone help Im trying to import a table from wiki for s&p500 ad followed everything @sentdex did for the ABBV but i get the error ImportError: html5lib not found, please install it. I dont know how to install it on mac and cannot find instructions on how to

alanvoong
Автор

Really good video . Helping me a lot thanks.

shashwatbilgrami