How to parallelize at multiple levels in R?

Question:

I've been researching how to parallelize the for in R and I found the package foreach , which, from what I understand and correct me if I'm wrong, replaces the for as follows:

library(foreach)
vetor <- rep(NA, 10)
n <- seq_len(10)
foreach(j = n) %dopar% {
vetor[j] <- j + 1
}

My question how to do it in a case where I have for(){for(){}} , for(){for(){for(){}}} … Is it possible to subparallelize?

Answer:

In general, it doesn't pay to parallelize on more than one level. This is possible, but it won't make your code run any faster, unless the first level of parallelism is not able to use all the idle resources of the computer.

Nowadays the easiest way to create parallel code in R is to use the future package in combination with furrr .

Here's a classic example of parallelization:

library(furrr)
#> Loading required package: future
library(purrr)
plan(multisession)

fun <- function(x) {
  Sys.sleep(1)
  x
}

system.time(
  map(1:4, fun)  
)
#>    user  system elapsed 
#>   0.004   0.001   4.020

system.time(
  future_map(1:4, fun)  
)
#>    user  system elapsed 
#>   0.077   0.012   1.297

Created on 2019-02-13 by the reprex package (v0.2.1)

In the example, the parallel version takes a little more than 1s while the sequential version takes 4s, as expected.

Now let's add a second level of parallelization.

library(furrr)
#> Loading required package: future
library(purrr)
plan(multisession)

fun <- function(x) {
  Sys.sleep(1)
  x
}

system.time(
  future_map(1:4, ~map(1:4, fun))  
)
#>    user  system elapsed 
#>   0.090   0.012   4.391

system.time(
  future_map(1:4, ~future_map(1:4, fun))  
)
#>    user  system elapsed 
#>   0.065   0.005   4.223

Created on 2019-02-13 by the reprex package (v0.2.1)

See that the two forms take very similar times. This happens because the first parallelization already uses all the idle CPU resource of the computer, the second level of parallelization cannot gain more space.

The first level might not be using all the resources of the computer, if for example my computer had 8 colors instead of 4, paralleling only on the first level would leave 4 colors underutilized. In this case it would make sense to parallelize at the second level. However, this is rare. In general we parallelize loops where the number of iterations is > than the number of colors.

Scroll to Top