Introduction to Feature Engineering | Introduction to dplyr Part 4

preview_player
Показать описание
In the final tutorial of the dplyr series, we will cover ways to do feature engineering both with dplyr (“mutate” and “transmute”) and base R (“ifelse”). You’ll learn how to impute missing values as well as create new values based on existing columns. In addition, we’ll go over four different ways to combine datasets. If you’ve followed all the videos in the series, you should be ready to get up and running with dplyr and use it to tackle a range of data manipulation tasks.

Code:

Introduction to R:

Watch the full series:

Be sure to also check our accompanying blog post here:

--

--

Unleash your data science potential for FREE! Dive into our tutorials, events & courses today!

--

📱 Social media links

--

Also, join our communities:

_

#featureengineering #dplyr #rprogramming
Рекомендации по теме
Комментарии
Автор

Dear Presentator! Thx for the intro series to dplyr. I liked the Tutorial dataset, since my home country, Austria, can be seen on the top ranks, that you built with the dplyr functions...:-) A few thoughts though: 1. Following an already coded syntax, which is organized via comments in the syntax editor of RStudio, would be much easier to follow (like in the ML/caret Tutorial on your channel by Dave Langer), than just using the R console, where there is no code highlighting either. 2. A stepwise introduction to the dplyr functions and the results they produce would be more transparent before starting with pretty long pipe-statements, where these functions get connected. 3. Concerning the shown join-functions the possibility of defining the primary key via the ‚by‘ statement is painfully missing and therefore the default key was not quite clear. Thx again for your efforts and this great channel. Sincerely, Gregory

gregorkvas
Автор

Dat editor in intro. I hope rstudio team could make a theme like that ( a darker theme than they have now )

rexevan
Автор

Thanks for the series. One question: at the end of this video you do give the disclaimer that for purposes of time you were only demonstrating with subset dataframes. So then if one is working with very large datasets, would these joins in R still be recommended or would you recommend some other method?

sann