Python Tutorial: Census Subject Tables

Показать описание

---

Welcome to Analyzing US Census Data in Python.

My name is Lee Hachadoorian. I am a geographer teaching graduate courses in Geographic Information Systems and geospatial analysis.

In this course, you will learn how to access and analyze United States census data. Most governments need to know the number and characteristics of their population.

In the US, a census is taken every 10 years by the Census Bureau, a bureau of the Department of Commerce. Often we just refer to "the Census" when we mean the Decennial Census of Population and Housing. But the Census Bureau produces many other data products.

In this course, you will also learn about the annual American Community Survey. But once you learn the Census API, you can use it to explore other products,
including the Current Population Survey--a monthly survey used to calculate official unemployment figures; the Economic Survey--a survey of businesses conducted every five years; or the Annual Survey of State and Local Government Finances, which can be used to study taxes and service provision by subnational governments.

This course assumes knowledge of core Python object types, as well as programming concepts such as package imports, control flow, and list comprehensions.

It also assumes an understanding of Pandas data frames, which we will work with a lot. You will create some simple visualizations with seaborn and geopandas, but no previous experience is assumed.

The Decennial Census counts, as near as possible, all persons and housing in the United States. It covers demographic topics such as age and sex, and housing topics such as homeownership and persons per room. People living in group quarters are counted separately. Vacant housing units are also counted.
The American Community Survey is an annual survey of approximately 1.5% of housing units that covers a large number of economic and social topics, which we will explore throughout this course.

The data is released in "subject tables" devoted to specific topics. We will familiarize ourselves with subject tables by working with table P5 - "Hispanic or Latino Origin by Race". The column identifiers begin P005 for the subject table, followed by a column index 1-17. Column 1 is the total population. It is broken down into two categories: "Not Hispanic or Latino" in column 2, "Hispanic or Latino" in column 10. These columns are broken down further into 7 racial groupings. Indented columns add up to their outdented parent. For example Columns 3 - 9 add up to Column 2.

We've created a Pandas data frame named "states" with data from table P5.

Each row is a state in the US. The variable codes have been replaced with descriptive column names, and the state name appears as the row index.

We will use seaborn, imported here with the alias sns, for data visualization.

The axis labels are a bit crowded, but we're not going to spend much time on plot customization in this course. Check out these other DataCamp courses if you want to go further.

Enough preamble. Let's jump in!

#DataCamp #PythonTutorial #AnalyzingUSCensusDatainPython #CensusSubjectTables #AnalyzeUScensusdata