Loading parallel package, which is a newer package than the snow package previously used. Look at the number of CPU cores.
library(parallel)
detectCores()
## [1] 8
Creating workers.
cl <- makeCluster(detectCores()) # you might be better specify a smaller number than detectCores()
Look at “date” on each worker.
clusterEvalQ(cl, date())
## [[1]]
## [1] "Sat Nov 15 16:18:40 2014"
##
## [[2]]
## [1] "Sat Nov 15 16:18:40 2014"
##
## [[3]]
## [1] "Sat Nov 15 16:18:40 2014"
##
## [[4]]
## [1] "Sat Nov 15 16:18:40 2014"
##
## [[5]]
## [1] "Sat Nov 15 16:18:40 2014"
##
## [[6]]
## [1] "Sat Nov 15 16:18:40 2014"
##
## [[7]]
## [1] "Sat Nov 15 16:18:40 2014"
##
## [[8]]
## [1] "Sat Nov 15 16:18:40 2014"
Preparing a sample data. This is a list of vectors.
N = 100 # number of jobs (can be larger than detectCores())
xx = list()
for(i in 1:N) {
xx[[i]] = i:(i+5)
}
xx[1:3] # looks like this
## [[1]]
## [1] 1 2 3 4 5 6
##
## [[2]]
## [1] 2 3 4 5 6 7
##
## [[3]]
## [1] 3 4 5 6 7 8
Attempt a simple computation on the data.
y0 = lapply(xx, mean)
y0[1:3] # looks like this
## [[1]]
## [1] 3.5
##
## [[2]]
## [1] 4.5
##
## [[3]]
## [1] 5.5
Attempt parallel computation.
y1 = parLapply(cl, xx, mean)
y1[1:3] # looks like this
## [[1]]
## [1] 3.5
##
## [[2]]
## [1] 4.5
##
## [[3]]
## [1] 5.5
Try using a simple function.
mysimple <- function(x) mean(x)
y2 = parLapply(cl, xx, mysimple)
y2[1:3] # looks like this
## [[1]]
## [1] 3.5
##
## [[2]]
## [1] 4.5
##
## [[3]]
## [1] 5.5
Try using a function which uses other functions internally.
mycomplex <- function(x) {100+mysimple(x)}
parLapply(cl, xx, mycomplex) # should be error
## Error in checkForRemoteErrors(val): 8 nodes produced errors; first error: 関数 "mysimple" を見つけることができませんでした
You have to export functions and datasets explicitly to workers.
clusterExport(cl, "mysimple") # exporting "mysimple" to all workers.
y3 = parLapply(cl, xx, mycomplex)
y3[1:3] # looks like this
## [[1]]
## [1] 103.5
##
## [[2]]
## [1] 104.5
##
## [[3]]
## [1] 105.5
Stop workers.
stopCluster(cl)