如何限制执行时间但是将输出保存在R中?

我试图限制分析的执行时间,但是我想保留分析已经做的事情.

在我的情况下,我正在运行xgb.cv(来自xgboost R软件包),我希望保持所有迭代,直到分析达到10秒(或“n”秒/分钟/小时).

我已经尝试过this thread中提到的方法,但它在达到10秒后停止,而不保持先前完成的迭代.

这是我的代码:

require(xgboost)
require(R.utils)

data(iris)
train.model <- model.matrix(Sepal.Length~., iris)

dtrain <- xgb.DMatrix(data=train.model, label=iris$Sepal.Length)

evalerror <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  err <- sqrt(sum((log(preds) -  log(labels))^2)/length(labels))
  return(list(metric = "error", value = err))}

xgb_grid = list(eta = 0.05, max_depth = 5, subsample = 0.7, gamma = 0.3,
  min_child_weight = 1)

fit_boost <- tryCatch(
            expr = {evalWithTimeout({xgb.cv(data  = dtrain,
                  nrounds     = 10000,
                  objective   = "reg:linear",
                  eval_metric = evalerror, 
                  early_stopping_rounds = 300,
                  print_every_n = 100,
                  params = xgb_grid,
                  colsample_bytree = 0.7, 
                  nfold = 5,
                  prediction = TRUE,
                  maximize = FALSE
                  )}, 
                  timeout = 10)
                  },                                        
            TimeoutException = function(ex) cat("Timeout. Skipping.\n"))

而输出是

#Error in dim.xgb.DMatrix(x) : reached CPU time limit

谢谢!

最佳答案 编辑 – 稍微接近你想要的:

用R的capture.output()函数包装整个事物.这会将所有评估输出存储为R对象.再一次,我认为你正在寻找更多的东西,但这至少是本地的和可塑的.句法:

fit_boost <- capture.output(tryCatch(expr = {evalWithTimeout({...}) ) )
> fit_boost
 [1] "[1]\ttrain-error:2.033160+0.006109\ttest-error:2.034180+0.017467 "  ...

原始答案:

您也可以使用sink.只需在开始进行交叉验证之前添加此行:

sink("evaluationLog.txt")
fit_boost <- tryCatch(
expr = {evalWithTimeout({xgb.cv(data  = dtrain,
                              nrounds     = 10000,
                              objective   = "reg:linear",
                              eval_metric = evalerror, 
                              early_stopping_rounds = 300,
                              print_every_n = 100,
                              params = xgb_grid,
                              colsample_bytree = 0.7, 
                              nfold = 5,
                              prediction = TRUE,
                              maximize = FALSE
)}, 
timeout = 10)
},                                        
TimeoutException = function(ex) cat("Timeout. Skipping.\n"))
sink()

最后的sink()通常会将输出返回到控制台,但在这种情况下它不会因为抛出错误而返回.但是一旦你运行它,你可以打开evaluationLog.txt和viola:

[1] train-error:2.033217+0.003705   test-error:2.032427+0.012808 
Multiple eval metrics are present. Will use test_error for early stopping.
Will train until test_error hasn't improved in 300 rounds.

[101]   train-error:0.045297+0.000396   test-error:0.060047+0.001849 
[201]   train-error:0.042085+0.000852   test-error:0.059798+0.002382 
[301]   train-error:0.041117+0.001032   test-error:0.059733+0.002701 
[401]   train-error:0.040340+0.001170   test-error:0.059481+0.002973 
[501]   train-error:0.039988+0.001145   test-error:0.059469+0.002929 
[601]   train-error:0.039698+0.001028   test-error:0.059416+0.003018 

当然,这并不完美.我想你想对这些进行一些操作,这不是最好的格式.但是,将其转换为更易于管理的东西并不是一个很高的要求.我还没有找到一种方法来在超时之前保存实际的xgb.cv $evaluation_log对象.这是一个非常好的问题.

点赞