elastic job提供的作业监控服务,目前唯一支持的功能是DUMP作业运行信息。
使用场景
使用Elastic-Job-Lite过程中可能会碰到一些分布式问题,导致作业运行不稳定。由于无法在生产环境调试,通过dump命令可以把作业内部相关信息dump出来,方便开发者debug分析。另外为了不泄露隐私,已将相关信息中的ip地址以ip1, ip2…的形式过滤,可以在互联网上公开传输环境信息,便于进一步完善Elastic-Job。
如何开启监控
设置io.elasticjob.lite.config.LiteJobConfiguration
的属性monitorPort
即可,默认不开启(属性值默认为-1)。
io.elasticjob.lite.internal.monitor.MonitorService
...
int port = configService.load(true).getMonitorPort();
// 端口小于0时,不开启监控服务
if (port < 0) {
return;
}
如何DUMP作业运行信息
echo "dump" | nc [任一作业服务器IP] monitorPort > job_debug_dump.txt
执行上述linux命令后,会把作业运行信息输出到job_debug_dump.txt
中。
实现分析
设置了监控端口的作业,启动时会开启监控服务。该服务是一个socket服务,监听端口为作业设置的端口。
io.elasticjob.lite.internal.monitor.MonitorService#openSocketForMonitor
...
// 开启SOCKET服务
serverSocket = new ServerSocket(port);
该socket服务只接受处理一个命令dump
。当服务接收到该命令后,会把作业运行信息输出到客户端。
if (null != cmdLine && DUMP_COMMAND.equalsIgnoreCase(cmdLine)) {
List<String> result = new ArrayList<>();
dumpDirectly("/" + jobName, result);
outputMessage(writer, Joiner.on("\n").join(SensitiveInfoUtils.filterSensitiveIps(result)) + "\n");
}
DUMP输出格式
/simpleJob/sharding |
/simpleJob/sharding/4 |
/simpleJob/sharding/4/instance | ip112@-@5872
/simpleJob/sharding/3 |
/simpleJob/sharding/3/instance | ip112@-@5872
/simpleJob/sharding/2 |
/simpleJob/sharding/2/instance | ip112@-@5872
/simpleJob/sharding/1 |
/simpleJob/sharding/1/instance | ip112@-@5872
/simpleJob/sharding/0 |
/simpleJob/sharding/0/instance | ip112@-@5872
/simpleJob/servers |
/simpleJob/servers/ip112 |
/simpleJob/leader |
/simpleJob/leader/sharding |
/simpleJob/leader/election |
/simpleJob/leader/election/latch |
/simpleJob/leader/election/instance | ip112@-@5872
/simpleJob/instances |
/simpleJob/instances/ip112@-@5872 |
/simpleJob/config | {"jobName":"simpleJob","jobClass":"com.xxx.elastic.ext.xxx.job.VMScheduleJob","jobType":"DATAFLOW","cron":"0/5 * * * * ?","shardingTotalCount":5,"shardingItemParameters":"","jobParameter":"","failover":false,"misfire":true,"description":"","jobProperties":{"job_exception_handler":"io.elasticjob.lite.executor.handler.impl.DefaultJobExceptionHandler","executor_service_handler":"io.elasticjob.lite.executor.handler.impl.DefaultExecutorServiceHandler"},"streamingProcess":true,"monitorExecution":true,"maxTimeDiffSeconds":-1,"monitorPort":9888,"jobShardingStrategyClass":"","reconcileIntervalMinutes":10,"disabled":false,"overwrite":true}