R Tutorial: Census data in R: An overview

Показать описание

---

Welcome! I'm Kyle Walker, your instructor. In this course, you'll learn the basics of working with and visualizing data from the US Census Bureau in R, with a focus on the tidycensus, tigris, and ggplot2 R packages.

We'll first cover how to acquire US Census and American Community Survey data with the tidycensus package; then we'll learn how to wrangle Census data using tidyverse tools. We'll then cover Census spatial data in R using the tigris package, and finally, learn how to make attractive maps of Census data with ggplot2.

I do research and consult in the fields of spatial demography and spatial data science, and I'm the lead developer of the R tidycensus, tigris, and idbr packages.

The United States Census Bureau collects a vast amount of demographic, social, and economic data about the United States. The core resources for demographic data include the decennial Census and the American Community Survey. The decennial Census is conducted every ten years and is a complete count of the United States population. The American Community Survey, or ACS, in contrast, is a survey of around 3 million households taken every year. Whereas the decennial Census asks only about core demographic characteristics like age and race, the ACS asks a much broader range of questions.

The US Census Bureau makes its data available to users via an application programming interface or API. The tidycensus R package wraps the decennial Census and ACS APIs, allowing R users to access Census data directly. Before using the package, users must acquire a Census API key, then supply it to the census_api_key() function. The key is a 40-character alphanumeric code similar to the one shown on the slide. Setting install equals TRUE will store the key on the user's computer for future use. You won't need to get an API key for this course, but you will need one to use tidycensus on your own computers.

The get_decennial() function in tidycensus allows R users to access data from the 1990, 2000, or 2010 decennial US Censuses, with 2010 as the default. In this example, we are acquiring data on total population by state. get_decennial() has two required arguments: geography, which specifies the level of aggregation for the Census data, and variables, a vector of Census variable ID codes for which you'd like to request data.

The function returns a tidy data frame with GEOID and NAME columns identifying the state; a variable column identifying the variable ID; and a value column showing the population value.

The get_acs() function allows R users to request data from the American Community Survey. The default dataset obtained by get_acs() is the 5-year, 2012-2016 ACS sample. As ACS data are based on survey samples, they represent estimates of population characteristics rather than exact data. In turn, ACS estimates are characterized by margins of error.

The tidy data frame returned by get_acs() resembles the one returned by get_decennial(), but with one key difference. Instead of a value column representing the data value, get_acs() returns an estimate column, with the data estimate, and a margin of error column representing the margin of error around that estimate with a confidence level of 90 percent.

Let's get started working with Census and ACS data using tidycensus.