R Tutorial: Basic tidycensus functionality

Показать описание

---

Now that you've learned how to use the core functions in tidycensus, we're going to cover some of the basic functionality of the package. This includes how to aggregate Census data are structured, and how to return data in different formats.

Data from the decennial Census and American Community Survey are available across the United States at different levels of aggregation. Data can be obtained for legal entities, such as counties or states, which have legal standing in the US; and for statistical entities, which are geographies defined by the Census Bureau, like Census tracts, for data tabulation. The level of aggregation is specified in tidycensus by the geography parameter, which corresponds to the ways geographies are formatted in the Census API. The tidycensus documentation in the included link provides a table of how to format these geographies.

In this example, we use the argument geography equals county to request county data from the ACS for the entire United States. The other required argument is variables, to which we supply a Census variable ID. The requested variable ID, B19013_001, represents the county median household income.

In many cases, you won't want data for the entire United States. You can use optional state and county arguments to get data for a geographic subset, like in this example where only counties in the state of Texas are returned. Additionally, the example shows how to pass a named vector for the variables argument, yielding a more informative value like hhincome for household income instead of the Census variable ID.

By default, tidycensus returns census data in tidy format, where Census variable IDs are stored in the variable column. However, there may be times where you would prefer that the variables spread across the columns. The spread of variables can be set with the argument output equals wide, which returns a separate column in the output dataset for each variable and its corresponding margin of error.

Now let's try this out in R.