Hadley Wickham's 'dplyr' tutorial at useR 2014 (2/2)

preview_player
Показать описание
Part 2/2 of the dplyr workshop held at UCLA during the useR 2014 conference.

dplyr is the premier data manipulation tool for data analysts who work in the R language. This package makes it easier than ever to sort, manage, and clean your dirty data with speed and efficiency.

topics covered: Grouped Mutate/Filter, Joins, Do, Databases

Рекомендации по теме
Комментарии
Автор

dropbox does not work, i also could not find the airports dataset.

ondrejplachy
Автор

Darn.  Ignore second problem. User error.  Very sorry I posted before checking more carefully.

sethchandler
Автор

Excited about this video but I am having two initial problems.  The nycflights2013 data.frame does not have a plane column.  It has a tailnum colum, which appears to be the same thing, but some renaming needs to be done.  Also, when I run the code at 1:30 I get an error "Error in n() : This function should not be called directly"  I'm not sure what this is about.  I am running R 3.1.1 in RStudio 0.98.1028 on OSX .

sethchandler
Автор

I think the z score part is not correct.
Should be like this:
planes_z <- flights %>%
filter(!is.na(arr_delay)) %>%
group_by(plane) %>%
filter(n() >30) %>%
mutate(z_delay =
(arr_delay - mean (arr_delay))/sd(arr_delay)) %>%
filter(z_delay >=3) %>%
select(plane, z_delay) %>%
arrange(desc(z_delay))
View(planes_z)

zakkyang