Running R code in parallel using parallel::clusterApply()

preview_player
Показать описание
R code is often quick to write, but not always quick enough to run. One strategy to speed up runtimes is to parallelize code. Here, we create 200 regression models using 200 different predictors - a task well suited for parallelization.

First, we set up the workers using makeCluster(). Next, we create a function that takes a predictor as input and returns a model summary. Then we can create all 200 models with a simple one-liner using lapply(). To parallelize, we have to overcome a small challenge, namely provide the workers with data using clusterExport(). Then we can simply exchange lapply() for clusterApply() to run our code in parallel.

The bench::mark() function shows the speed improvement that gave us.

Code can be found here:

All the best for speeding up your R code!

Thumbnail image: Chait Goli from Pexels

Contact me, e. g. to discuss (online) R workshops / trainings / webinars:

Playlist: Music chart history
Рекомендации по теме
Комментарии
Автор

I've used parallel::detectCores() a lot, taught it in workshops, and also used it in this video. However, a number of serious problems may arise from using this function. A better alternative is parallelly::availableCores(). Thanks to Henrik Bengtsson!

See this newer video:
Why You Should NOT use parallel::detectCores() in R

StatistikinDD
Автор

Great Video, glad I found your channel

MsBainy
Автор

"If you have ssh installed, you can specify a list of machines for the first argument:

cl <- makeCluster(c("n1", "n2", "n3", "n4"))".

How do I get the name of the machines? to build the list.

lucianomaldonado
join shbcf.ru