filmov
tv
R Tutorial: Visualizing Census data with ggplot2

Показать описание
---
One important way to explore data acquired from the US Census Bureau is through visualization. In this lesson, we will cover basic visualization of Census data using the popular ggplot2 visualization package.
ggplot2 is the core plotting library within the Tidyverse, and one of the most popular packages for data visualization in R. ggplot2 enables users to create highly customizable plots through its layered grammar of graphics interface, in which users specify plot components as layers.
The code here illustrates a basic example of how to create a visualization of ACS data using ggplot2. We'll first fetch median household income data for seven states in the northeastern United States from the 2016 1-year American Community Survey.
The core ggplot() function in ggplot2 requires a dataset, and then an aesthetic mapping of columns onto different elements of the plot, wrapped in aes(). In this example, we'll be creating a dot plot that compares the median incomes of the different states, with the value on the x-axis and the state name on the y-axis. The geom_point() method, which follows a plus sign operator, tells ggplot2 to visualize this with dots.
The image, as we've created it, allows for a comparison of median household income values by state in accordance with the position of the dots. This type of chart, called a Cleveland dot plot, is a popular and effective way to compare these sorts of values. However, the plot as we've created it has several disadvantages. The dots are small and difficult to see; there is no information about what the "estimate" means; and the dots are not sorted, making trends difficult to discern. Further, the plot labels are quite small and hard to read. In turn, it will be helpful to customize the appearance of the plot further.
The ggplot2 code shown here modifies the plot we already created to make it more legible. Changes in the ggplot() and geom_point() functions are evident. We're using the reorder() function to order our state dots by their ACS estimates in the chart, and we're changing the color and size of the dots in geom_point() to help with visibility. Beyond this, we are specifying some additional elements of the chart with some additional code. We'll format the x-axis tick labels using the dollar method from the scales package to help chart viewers understand the content of the chart. Additionally, ggplot2 supports a variety of themes; we'll use the minimal theme here to reduce the prominence of the background. We'll then set an x-axis label, remove the y-axis label, and give the chart a title.
Here is the result, which is much easier to understand because of the modifications we've made. The dots stand out against the white background, and the sorting of the dots in descending order of value help the viewer make comparisons between states. The plot labels are also larger and more apparent.
Now, you'll learn how to make this plot yourself using R.