quickStart with R (Part 1): Getting started, data wrangling, and EDA (Part 1)

preview_player
Показать описание
THIRD edition of the introduction to R workshop. Part 1
0:00 - Introduction
10:55 - Reprex - REPRoducible EXample
13:51 - Assignment and Pipes
17:00 - R: a data-first programming approach
18:44 - Tidyverse: a modern data science approach to R
20:17 - Tidy Data
22:01 - outline
22:27 - reproducibility (brief definition)
23:38 - RStudio Projects
24:05 - some reproducible barriers, e.g. setwd()
25:24 - Literate coding
26:53 - Role in Reproducibility
28:34 - Demonstration
32:53 - demo: RStudio projects
33:04 - demo: Some IDE settings to improve your reproducibility
33:50 - demo: RStudio projects (continued)
36:25 - demo: how to import data
38:11 - demo: Scripts and R Markdown notebooks
39:25 - demo: render different types of reports & literate coding
52:12 - demo: render an MSWord file
53:38 - data wrangling
55:48 - downloading workshop code and data from a GitHub repository
1:01:51 - dplyr
1:04:16 - dplyr::filter() -- subset by row
1:05:14 - dplyr::select() -- subset by column
1:06:22 - dplyr::arrange() - sort rows by variable
1:10:11 - glimpse()
1:16:53 - dplyr::mutate() - manipulate variable values
1:22:23 - summarize() ; group_by() ; count()
1:27:47 - data types - char / dbl / int
1:32:12 - Questions and Answers

Part of the Rfun learning series:
R and the Tidyverse are a data-first coding language that enables reproducible workflows. In this two-part workshop, you’ll learn the fundamentals of R, everything you need to know to quickly get started. You’ll learn how to wrangle data for analysis, gain a brief introduction to visualization, practice Exploratory Data Analysis (EDA), and how to generate reports. By the end of part 1 you will import data, edit and save scripts, subset data, use projects to organize your work, and develop self-help techniques.
---
Рекомендации по теме
Комментарии
Автор

Thanks, great tutorial. But could you please make the screen font bigger for your future totorial? It is too small.

CanDoSo_org
Автор

John,
I am not sure if you have said this is some of your other videos but there are less issue with NA in long data than wide. The NAs tend to go away when we pivot to long.

haraldurkarlsson
Автор

John,
I am not sure if this is of interest to you but you could use count instead of group_by and summarize (as you stated). For example:
gapminder %>%
count(year,
wt =
sort = TRUE,
name = "world_pop_millions")

To get the world population by year.

haraldurkarlsson
Автор

By the way there is now a "pipe" in base R also. It is "|" and ">". Together they are type of triangle and this format also works in the tidyverse. So if you are typing the pipe out then this is a "shorter" code.

haraldurkarlsson
Автор

R Notebook vs. RMarkdown,
John,
I am one of those people who is confused about the difference. I have worked mostly with scripts and lately more RMarkdown. However, it ssems to me that script is better until your code is pretty much ready to go. Often when I create a RMarkdown file it will break down after I knit perhaps too often. Code hangs up the knitting. I mentioned that I could not tell the difference between notebook and markdown (R that is). I created a notebook but then when I knit it it switched over to RMarkdown and stayed that way. Is that normal?

haraldurkarlsson
Автор

Just a different perspective on reproducibility.
Reproducibility has been and is the basis of most good scientific work. However, it is not quite like in computer science. In the physical sciences you either collect your own sample or perform experiments in the lab. As long as you describe your techniques and methods your work should be reproducible. Unfortunately, many journals were not so interested in the methods (took too much space and increased the cost to the authors) and took the authors word for it (or the lab benefitted for being reputable). Only in thesis could one find the nitty-gritty details that allowed you to at least attempt to reproduce the work (that is the argument against stapling together papers and calling it a thesis). In many cases, however, that even that failed (unknown standards, difference in equipment setup, heterogeneity in samples). So in comparison computer and data scientists should have a much easier time of reproducing work and have literally no excuse for not be able to do so.

haraldurkarlsson
welcome to shbcf.ru