我必须平滑一个大的时间序列,我正在使用’raster’包中的movingFun函数.我根据之前的帖子测试了几个选项(请参阅下面的选项).前2个工作,但在使用真实数据时非常慢(所有澳大利亚的所有MOD13Q1时间序列).所以我尝试了选项3并失败了.如果有人可以帮助指出该功能中的错误,我会很感激.我可以访问内存,我使用的是具有700GB RAM的RStudio服务器,但是,我不确定做这项工作的最佳方法是什么.提前致谢.
a)使用movingFun和overlay
library(raster)
r <- raster(ncol=10, nrow=10)
r[] <- runif(ncell(r))
s <- brick(r,r*r,r+2,r^5,r*3,r*5)
ptm <- proc.time()
v <- overlay(s, fun=function(x) movingFun(x, fun=mean, n=3, na.rm=TRUE, circular=TRUE)) #works
proc.time() - ptm
user system elapsed
0.140 0.016 0.982
b)创建函数并使用clusterR.我认为这比(a)快.
fun1 = function(x) {overlay(x, fun=function(x) movingFun(x, fun=mean, n=6, na.rm=TRUE, circular=TRUE))}
beginCluster(4)
ptm <- proc.time()
v = clusterR(s, fun1, progress = "text")
proc.time() - ptm
endCluster()
user system elapsed
0.124 0.012 4.069
c)我发现this document由Robert J. Hijmans编写,我尝试(并且失败)编写了一个如小插曲中描述的功能.我不能完全遵循该功能中的所有步骤,这就是失败的原因.
smooth.fun <- function(x, filename='', smooth_n ='',...) { #x could be a stack or list of rasters
out <- brick(x)
big <- ! canProcessInMemory(out, 3)
filename <- trim(filename)
if (big & filename == '') {
filename <- rasterTmpFile()
}
if (filename != '') {
out <- writeStart(out, filename, ...)
todisk <- TRUE
} else {
vv <- matrix(ncol=nrow(out), nrow=ncol(out))
todisk <- FALSE
}
bs <- blockSize(out)
pb <- pbCreate(bs$n)
if (todisk) {
for (i in 1:bs$n) {
v <- getValues(out, row=bs$row[i], nrows=bs$nrows[i] )
v <- movingFun(v, fun=mean, n=smooth_n, na.rm=TRUE, circular=TRUE)
out <- writeValues(out, v, bs$row[i])
pbStep(pb, i)
}
out <- writeStop(out)
} else {
for (i in 1:bs$n) {
v <- getValues(out, row=bs$row[i], nrows=bs$nrows[i] )
v <- movingFun(v, fun=mean, n=smooth_n, na.rm=TRUE, circular=TRUE)
cols <- bs$row[i]:(bs$row[i]+bs$nrows[i]-1)
vv[,cols] <- matrix(v, nrow=out@ncols)
pbStep(pb, i)
}
out <- setValues(out, as.vector(vv))
}
pbClose(pb)
return(out)
}
s <- smooth.fun(s, filename='test.tif', smooth_n = 6, format='GTiff', overwrite=TRUE)
Error in .local(.Object, ...) :
`/path-to-dir/test.tif' does not exist in the file system,
and is not recognised as a supported dataset name.
最佳答案 这是我找到的解决方案,感谢我的同事.它在20分钟内计算每年(23个文件).可能有一些事情需要改进,但在这个阶段,我很高兴我每年只能在20分钟内完成这项工作.
所以在这里我使用foreach包同时运行5年.然后for循环创建一个包含6个文件的数组;记住,我需要一个3个月的移动窗口,在MOD13Q1 16天数据集中,这是6个文件.然后循环使用ColMeans计算阵列上的平均值,创建一个空栅格,将平均值分配给新栅格并保存.请注意,我们重新创建了movingFun函数的循环选项.因此,第一个日期的平均值是根据同一年的最后日期完成的.
require(raster)
require(rgdal)
library(foreach)
library(doParallel)
rasterOptions(maxmemory = 3e10, chunksize = 2e10)
ip <- "directory-with-grids"
op <- "directory-where-resuls-are-being-saved"
years = c(2000:2017)
k <- 6 # moving window size
k2 <- floor((k-1)/2)
slot <- 0
# determine clusters
cl <- makeCluster(5, outfile = "") # make worker prints visible
registerDoParallel(cl)
foreach(j = 1:length(years), .packages=c("raster")) %dopar% {
ip1 = paste(ip, years[j],sep='/')
ndvi.files <- list.files(ip1, pattern = 'ndvi.*tif$',full.names = T)
nfiles <- length(ndvi.files)
for (n in (1-(k-1)):nfiles) {
i <- (n + k2 - 1) %% nfiles + 1
print(ndvi.files[i])
r <- raster(ndvi.files[i])
if (slot == 0) {
win <- matrix(data = NA, nrow = k, ncol = r@nrows * r@ncols)
}
slot <- slot %% k + 1
win[slot,] <- getValues(r)
if (n > 0) {
o <- raster(extent(c(xx,xx,xx ,xx))); res(o)=c(xx,xx) # your extent and resolution
crs(o) <-'+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0'
o[] <- colMeans(win)
o[o<0] <- NA
# write out m as the nth raster
fname = paste(names(r),'smoothed',sep='_')
out.dir = file.path(op, paste(years[j], sep='/'))
dir.create(out.dir,showWarnings = FALSE)
out.path = file.path(out.dir, fname)
writeRaster(o, out.path, format="Geotiff", overwrite=T, datatype='INT2S')
}
}
}
stopCluster(cl)