filmov
tv
Data Type Conversion In Python - IRS Taxpayer Data!

Показать описание
In this video you will learn how to do easy and quick data type conversions in python. We will use jupyter notebooks and python 3. As with all my other videos on this channel - we will use real data. In this case we will use the fun IRS US taxpayer dataset!
Disclaimer - So no one gets their hopes up to see real taxpayer names and similar, the dataset has been scrubbed so there is no personally identifiable information - else we would not be able to show it to you. However, you might find it interesting to see the data the IRS actively has on US taxpayers (how you vote, etc...) They have been known to use this info to target audits in the past...
Anyways, we will start with loading the 2 required libraries (Pandas and NumPy). We will mostly use Pandas here, but there is one instance where we will need NumPy - to correct a default integer setting that we cannot fix with Pandas alone. The datatype defaults to the int32 and we want to get it back to int64 and this requires NumPy as I will show you.
We will look at determining data types of the dataframe, individual columns and all columns. Then I will go through switching from an integer to a float, back to an integer, then conversions to string. Then we will also cover tuples, lists and sets. We will finish up by graphing the data in 2 scatter plots:
1) a generic scatter plot on the overall data1 dataframe columns.
2) a filtered subset that zooms in on taxpayers with household debt levels above $10,000, HHI (Household Income) below $85k and that did not file tax returns in 2015, 2016 and 2017.
The first scatter plot yields little if anything because all three parties (independent, republican and democrat are well represented at all ranges. The second one, however, yields some interesting data. Democrats are 4 times more likely then independents and 12 times more likely then republicans to not file their returns.
Regardless of how we all vote politically (I am registered as a Democrat by the way, so no haters here) this is some interesting stuff and shows you a little of what you can quickly and easily do with python, jupyter notebooks and some interesting data (like the IRS taxpayer dataset). Technically, what we just did here qualifies as a little exploratory data analysis - hint, hint!
Thanks for watching!
Please take a moment to subscribe, like and share and be sure to click the bell so you will get notified every time I publish a great video like this one!
Thanks again and God Bless!
Disclaimer - So no one gets their hopes up to see real taxpayer names and similar, the dataset has been scrubbed so there is no personally identifiable information - else we would not be able to show it to you. However, you might find it interesting to see the data the IRS actively has on US taxpayers (how you vote, etc...) They have been known to use this info to target audits in the past...
Anyways, we will start with loading the 2 required libraries (Pandas and NumPy). We will mostly use Pandas here, but there is one instance where we will need NumPy - to correct a default integer setting that we cannot fix with Pandas alone. The datatype defaults to the int32 and we want to get it back to int64 and this requires NumPy as I will show you.
We will look at determining data types of the dataframe, individual columns and all columns. Then I will go through switching from an integer to a float, back to an integer, then conversions to string. Then we will also cover tuples, lists and sets. We will finish up by graphing the data in 2 scatter plots:
1) a generic scatter plot on the overall data1 dataframe columns.
2) a filtered subset that zooms in on taxpayers with household debt levels above $10,000, HHI (Household Income) below $85k and that did not file tax returns in 2015, 2016 and 2017.
The first scatter plot yields little if anything because all three parties (independent, republican and democrat are well represented at all ranges. The second one, however, yields some interesting data. Democrats are 4 times more likely then independents and 12 times more likely then republicans to not file their returns.
Regardless of how we all vote politically (I am registered as a Democrat by the way, so no haters here) this is some interesting stuff and shows you a little of what you can quickly and easily do with python, jupyter notebooks and some interesting data (like the IRS taxpayer dataset). Technically, what we just did here qualifies as a little exploratory data analysis - hint, hint!
Thanks for watching!
Please take a moment to subscribe, like and share and be sure to click the bell so you will get notified every time I publish a great video like this one!
Thanks again and God Bless!
Комментарии