google-cloud-platform – 如何从com.google.api.client.googleapis.json.GoogleJsonResponseException上的云数据流作业恢复失败:410已过

运行4个小时后,我的Cloud Dataflow作业神秘地失败,因为工作人员四次抛出此异常(在一个小时的时间内).异常堆栈看起来像这样.

java.io.IOException: com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone { "code" : 500, "errors" : [ { "domain" : "global", "message" : "Backend Error", "reason" : "backendError" } ], "message" : "Backend Error" }

at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:431)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:289)
at com.google.cloud.dataflow.sdk.io.FileBasedSink$FileBasedWriter.close(FileBasedSink.java:516)
at com.google.cloud.dataflow.sdk.io.FileBasedSink$FileBasedWriter.close(FileBasedSink.java:419)
at com.google.cloud.dataflow.sdk.io.Write$Bound$2.finishBundle(Write.java:201) Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone { "code" : 500, "errors" : [ { "domain" : "global", "message" : "Backend Error", "reason" : "backendError" } ], "message" : "Backend Error" }
at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:357)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

堆栈跟踪中的类都没有直接来自我的工作,所以我甚至无法捕获并恢复.

我检查了我的区域,云存储(由同一项目拥有)等,它们都没问题.其他工人也跑得很好.看起来像Dataflow中的某种错误?如果没有别的,我真的想知道如何从中恢复:这项工作完全花了30个小时,现在生成了一堆临时文件,我不知道它们有多完整……如果我重新运行这份工作我担心它会再次失败.

对于Google员工,工作ID是2016-08-25_21_50_44-3818926540093331568.谢谢!!

最佳答案 解决方案是在输出上指定withNumShards(),其值为固定值< 10000.这是我们希望将来删除的限制.

点赞