Question:
I've been researching how to parallelize the for
in R and I found the package foreach
, which, from what I understand and correct me if I'm wrong, replaces the for
as follows:
library(foreach)
vetor <- rep(NA, 10)
n <- seq_len(10)
foreach(j = n) %dopar% {
vetor[j] <- j + 1
}
My question how to do it in a case where I have for(){for(){}}
, for(){for(){for(){}}}
… Is it possible to subparallelize?
Answer:
In general, it doesn't pay to parallelize on more than one level. This is possible, but it won't make your code run any faster, unless the first level of parallelism is not able to use all the idle resources of the computer.
Nowadays the easiest way to create parallel code in R is to use the future
package in combination with furrr
.
Here's a classic example of parallelization:
library(furrr)
#> Loading required package: future
library(purrr)
plan(multisession)
fun <- function(x) {
Sys.sleep(1)
x
}
system.time(
map(1:4, fun)
)
#> user system elapsed
#> 0.004 0.001 4.020
system.time(
future_map(1:4, fun)
)
#> user system elapsed
#> 0.077 0.012 1.297
Created on 2019-02-13 by the reprex package (v0.2.1)
In the example, the parallel version takes a little more than 1s while the sequential version takes 4s, as expected.
Now let's add a second level of parallelization.
library(furrr)
#> Loading required package: future
library(purrr)
plan(multisession)
fun <- function(x) {
Sys.sleep(1)
x
}
system.time(
future_map(1:4, ~map(1:4, fun))
)
#> user system elapsed
#> 0.090 0.012 4.391
system.time(
future_map(1:4, ~future_map(1:4, fun))
)
#> user system elapsed
#> 0.065 0.005 4.223
Created on 2019-02-13 by the reprex package (v0.2.1)
See that the two forms take very similar times. This happens because the first parallelization already uses all the idle CPU resource of the computer, the second level of parallelization cannot gain more space.
The first level might not be using all the resources of the computer, if for example my computer had 8 colors instead of 4, paralleling only on the first level would leave 4 colors underutilized. In this case it would make sense to parallelize at the second level. However, this is rare. In general we parallelize loops where the number of iterations is > than the number of colors.