Tidy Tuesday live screencast: Analyzing historical phones in R

preview_player
Показать описание
I'll analyze a dataset about historical phone adoption, without looking at the dataset in advance.

Рекомендации по теме
Комментарии
Автор

0:55 Downloading and exploring the dataset [mobile/landline <- tidytuesdayR::tt_load("2020-11-10")$mobile, or $landline]
2:12 Binding two data sets together [rename(subscription = mobile_subs), mutate(type = "Mobile"), phones <- bind_rows (mobile, landline)]
4:08 Plot subscriptions for the US [geom_line(), aes(color = type), filter(country == "United States")]
8:42 Plotting multiple (most populated) countries at once [group = interaction(type, country)]
9:30 Country filter functionality [semi_join(country_sizes %>% top_n(10, avg_population), by = "country")]
10:25 facet_wrap( ~ continent), top_n(40)
11:35 Explaining the "group = interaction(type, country)" functionality
13:05 Average subscriptions per person for mobile and landline adoption differences per continent.
14:03 Adding 25th-75th percentile boundries [summarize(q25, q75), geom_ribbon(, alpha = .25))]
17:40 Adding income levels [library(WDI), WDI(start = 2005, end = 2005, extra = TRUE), use: income, iso3c)
19:40 Join both data sets [phones %>% inner_join(country_incomes, by = "code")]
20:00 Plot subscriptions by income levels instead of continent. [income = fct_relevel(income, "Low income" ...)]
21:28 "dot and pipe functionality" [ <- . %>% ]
23:04 Compare mobile/landline across income levels [aes(color = income), facet_wrap(~ type, ncol = 1)]
25:25 Do Population and GDP differ between mobile and landline data sets? [inner_join(), suffix = c("_mobile", "_landline")]
29:16 Looking for aggregation by country, to produce some meaningful summary stats.
30:49 When do countries cross the 50 mobile subscriptions per 100 people. [geom_hline(yintercept = 50, lty = 2)]
32:34 Which countries crossed this threshold first [group_by(country), summarize(year_past_50 = min(year[subscriptions >= 50]))
33:35 Use na_if(<condition>, Inf) to remove countries that never crossed this threshold.
35:28 Visualizing the results [pivot_wider(names_from = type, values_from = subscriptions), ]
37:00 use select(-total_pop, gdp_per_cap) to make pivot_wider() work.
38:17 Extract GDP data from WDI [indicator= c(gdp = NY.GDP.PCAP.PP.KD)]
40:00 Correlation between GDP per capita in 2005 and the year of passing 50 mobile subscriptions/100
40:30 Plotting the results [ggplot(aes(gdp_per_capita, year_past_50_mobile, color = continent)), geom_point(), scale_x_log10()]
42:10 Extract population from WDI and use to size the points.
44:33 Investigate peak landline based on GDP per capita.
48:20 Show results per continent [facet_wrap(~ continent), theme(legend.position = "none"]
49:55 Animated map time (Lightning Round)! [library(fuzzyjoin), map_data("world"), regex_left_join(maps::iso3166, c(region = "mapname"))]
53:10 Plot results for year 2000 [world_map_mobile %>% filter(year == 2000) %>% ggplot(aes(long, lat, group = group, fill = subscriptions)) + geom_polygon()]
54:30 Improve map with [coord_fixed(1.5), ggthemes::theme_map(), scale_fill_gradient2(low, high, midpoint)]
56:04 Animation of the map [library(gganimate), transition_manual(year)]

TheDataDigest
Автор

This is the first of your tidy tuesday screencasts that I have watched and I learned so much just by observing your coding stream of consciousness. Very helpful and insightful.

peterfortunato
Автор

Hi, would someone explain why the [group=interaction] code was necessary when plotting all countries, but not necessary when plotting a single country?

jaradj
Автор

Could you please enable subtitles in english?

alvaromorales