Question:
I'm doing the following operation with an array in R:
> m <- matrix(1:9, ncol = 3, nrow = 3)
> m
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> m2 <- m
> for(i in 1:nrow(m)){
+ for(j in 1:ncol(m)){
+ c <- expand.grid((i-1):(i+1), (j-1):(j+1))
+ c <- c[c[,1] > 0 & c[,2]>0 & c[,1] <= nrow(m) & c[,2] <= ncol(m),]
+ m2[i,j] <- mean(m[c[,1], c[,2]])
+ }
+ }
> m2
[,1] [,2] [,3]
[1,] 3.0 4.5 6.0
[2,] 3.5 5.0 6.5
[3,] 4.0 5.5 7.0
That is, I am calculating for each element of the array the average of its adjacent elements.
However, my arrays are images and this method happens to be very inefficient for large arrays…
Does anyone know if there is any way to vectorize this loop?
Answer:
There are packages with this type of function. Adapting the answer posted on SOen :
library(raster)
r <- raster(m)
as.matrix(focal(r, matrix(1,3,3), mean, pad = TRUE, padValue = NA, na.rm = TRUE))
Resulting in:
[,1] [,2] [,3]
[1,] 3.0 4.5 6.0
[2,] 3.5 5.0 6.5
[3,] 4.0 5.5 7.0
We can compare the performance using a slightly larger array:
mbig <- matrix(1:441, ncol = 21, nrow = 21)
library(microbenchmark)
microbenchmark(Daniel = {
m2 <- mbig
for(i in 1:nrow(mbig)){
for(j in 1:ncol(mbig)){
c <- expand.grid((i-1):(i+1), (j-1):(j+1))
c <- c[c[,1] > 0 & c[,2]>0 & c[,1] <= nrow(mbig) & c[,2] <= ncol(mbig),]
m2[i,j] <- mean(mbig[c[,1], c[,2]])
}
}
},
Raster = {
r <- raster(mbig)
as.matrix(focal(r, matrix(1,nrow(mbig),ncol(mbig)), mean, pad = TRUE, padValue = NA, na.rm = TRUE))
})
The difference appears to be significant:
Unit: milliseconds
expr min lq mean median uq max neval cld
Daniel 116.16301 118.47165 121.1388 119.32912 120.54214 204.8606 100 b
Raster 13.16006 13.41737 14.0498 13.61459 13.87249 31.2184 100 a