我正在寻找允许访问 Spark Streaming Statistics 的 API,这些统计信息在历史服务器的“Streaming”选项卡中可用。
我主要对批处理时间值感兴趣,但至少根据文档,它不能通过 REST API 直接获得: https://spark.apache.org/docs/latest/monitoring.html#rest-api
关于如何获取各种信息(如“流”选项卡或在历史服务器中运行的作业)有什么想法吗?
最佳答案
在与驱动程序节点上的 Spark UI 相同的端口上有一个可用的指标端点。
http://<host>:<sparkUI-port>/metrics/json/
与流媒体相关的指标有一个 .StreamingMetrics
以他们的名义:
来自本地测试作业的示例:
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_processingDelay: {
value: 30
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_processingEndTime: {
value: 1498124090031
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_processingStartTime: {
value: 1498124090001
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_schedulingDelay: {
value: 1
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_submissionTime: {
value: 1498124090000
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_totalDelay: {
value: 31
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastReceivedBatch_processingEndTime: {
value: 1498124090031
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastReceivedBatch_processingStartTime: {
value: 1498124090001
}
要获得处理时间,我们需要 diff local- StreamingMetrics.streaming.lastCompletedBatch_processingEndTime -
StreamingMetrics.streaming.lastCompletedBatch_processingStartTime
关于apache-spark - 用于 Spark 流统计的 API,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44694424/