haproxy 统计 : qtime, ctime,time,time?

在 HAProxy 1.6.3-1ubuntu0.1 后面运行 Web 应用程序，我得到了 haproxy 统计信息 qtime,ctime,rtime,ttime 0,0,0,2704 的值.

来自文档( https://www.haproxy.org/download/1.6/doc/management.txt ):

 58. qtime [..BS]: the average queue time in ms over the 1024 last requests
 59. ctime [..BS]: the average connect time in ms over the 1024 last requests
 60. rtime [..BS]: the average response time in ms over the 1024 last requests
     (0 for TCP)
 61. ttime [..BS]: the average total session time in ms over the 1024 last requests

我期望响应时间在 0-10 毫秒范围内。 ttime 2704 毫秒似乎高得不切实际。是否有可能单位关闭，这是 2704 微秒而不是 2704 毫秒？

其次，ttime似乎很可疑。甚至不接近 qtime+ctime+rtime .总响应时间不是排队、连接和响应时间的总和吗？其他时间是什么，包括在总数中但不包括队列/连接/响应？为什么我的响应时间可以小于 1 毫秒，但我的总响应时间约为 2704 毫秒？

这是我的完整 csv 统计数据:

$ curl "http://localhost:9000/haproxy_stats;csv"
# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,
http-in,FRONTEND,,,4707,18646,50000,5284057,209236612829,42137321877,0,0,997514,,,,,OPEN,,,,,,,,,1,2,0,,,,0,4,0,2068,,,,0,578425742,0,997712,22764,1858,,1561,3922,579448076,,,0,0,0,0,,,,,,,,
servers,server1,0,0,0,4337,20000,578546476,209231794363,41950395095,,0,,22861,1754,95914,0,no check,1,1,0,,,,,,1,3,1,,578450562,,2,1561,,6773,,,,0,578425742,0,198,0,0,0,,,,29,1751,,,,,0,,,0,0,0,2704,
servers,BACKEND,0,0,0,5919,5000,578450562,209231794363,41950395095,0,0,,22861,1754,95914,0,UP,1,1,0,,0,320458,0,,1,3,0,,578450562,,1,1561,,3922,,,,0,578425742,0,198,22764,1858,,,,,29,1751,0,0,0,0,0,,,0,0,0,2704,
stats,FRONTEND,,,2,5,2000,5588,639269,8045341,0,0,29,,,,,OPEN,,,,,,,,,1,4,0,,,,0,1,0,5,,,,0,5374,0,29,196,0,,1,5,5600,,,0,0,0,0,,,,,,,,
stats,BACKEND,0,0,0,1,200,196,639269,8045341,0,0,,196,0,0,0,UP,0,0,0,,0,320458,0,,1,4,0,,0,,1,0,,5,,,,0,0,0,0,196,0,,,,,0,0,0,0,0,0,0,,,0,0,0,0,

最佳答案

在 haproxy >2 中，您现在得到两个值 n / n这是滑动窗口内的最大值和该窗口的平均值。最大值在所有样本窗口中保持最大值，直到找到更高的值。在 1.8 上，您只能获得平均值。
haproxy 2 v 1.8 的示例。请注意，这些代理的使用方式非常不同，并且负载也大不相同。

所以看起来至少自上次重启以来的平均响应时间是 66m 和 275ms。
平均值计算如下:

data time + cumulative http connections - 1 / cumulative http connections

这可能不是一个完美的分析，所以如果有人有改进，我们将不胜感激。这是为了展示我如何得出上述答案，以便您可以使用它来更深入地了解您询问的其他计数器。大部分信息来自阅读 stats.c .您询问的计数器已定义 here .

unsigned int q_time, c_time, d_time, t_time; /* sums of conn_time, queue_time, data_time, total_time */
unsigned int qtime_max, ctime_max, dtime_max, ttime_max; /* maximum of conn_time, queue_time, data_time, total_time observed */```

统计页面值是从以下代码构建的:

if (strcmp(field_str(stats, ST_F_MODE), "http") == 0)
            chunk_appendf(out, "<tr><th>- Responses time:</th><td>%s / %s</td><td>ms</td></tr>",
                      U2H(stats[ST_F_RT_MAX].u.u32), U2H(stats[ST_F_RTIME].u.u32));
            chunk_appendf(out, "<tr><th>- Total time:</th><td>%s / %s</td><td>ms</td></tr>",
                  U2H(stats[ST_F_TT_MAX].u.u32), U2H(stats[ST_F_TTIME].u.u32));

你问了所有的柜台，但我会专注于一个。正如上面“响应时间:”的片段所示:ST_F_RT_MAX和 ST_F_RTIME是在统计页面上显示为 n (rtime_max) / n (rtime) 的值分别。这些定义如下:

[ST_F_RT_MAX] = { .name = "rtime_max", .desc = "Maximum observed time spent waiting for a server response, in milliseconds (backend/server)" },
[ST_F_RTIME] = { .name = "rtime", .desc = "Time spent waiting for a server response, in milliseconds, averaged over the 1024 last requests (backend/server)" },

这些在代码中更下方的 case 语句中设置了一个“度量”值(除其他外):

case ST_F_RT_MAX:
    metric = mkf_u32(FN_MAX, sv->counters.dtime_max);
    break;
case ST_F_RTIME:
    metric = mkf_u32(FN_AVG, swrate_avg(sv->counters.d_time, srv_samples_window));
    break;

这些指标值让我们可以很好地了解统计页面告诉我们的内容。 “响应时间: 0 /0”中的第一个值ST_F_RT_MAX ，是一些最大值 等待的时间。 “响应时间:0/ 0 ” ST_F_RTIME 中的第二个值是平均每次连接所花费的时间。这些是一段时间内的最大值和平均值，即无论您获得 1024 个连接需要多长时间。
例如“响应时间:10000/20”:

过去 1024 个连接花费的最大等待时间(曾经达到的最大值，包括 http keepalive 时间) 10 秒

过去 1024 个连接的平均时间 20 毫秒

所以出于所有意图和目的

rtime_max = dtime_max
rtime = swrate_avg(d_time, srv_samples_window)

这引出了什么是dtime_max的问题d_time和 srv_sample_window ?这些是数据时间窗口，我实际上无法弄清楚这些时间值是如何设置的，但从表面上看，它是最后 1024 个连接的“一段时间”。正如所指出的here keepalive 时间包含在最大总数中，这就是为什么数字很高。
现在我们知道了 ST_F_RT_MAX是最大值和 ST_F_RTIME是平均数，什么是平均数？

/* compue time values for later use */
if (selected_field == NULL || *selected_field == ST_F_QTIME ||
    *selected_field == ST_F_CTIME || *selected_field == ST_F_RTIME ||
    *selected_field == ST_F_TTIME) {
    srv_samples_counter = (px->mode == PR_MODE_HTTP) ? sv->counters.p.http.cum_req : sv->counters.cum_lbconn;
    if (srv_samples_counter < TIME_STATS_SAMPLES && srv_samples_counter > 0)
        srv_samples_window = srv_samples_counter;
}

TIME_STATS_SAMPLES值定义为

#define TIME_STATS_SAMPLES 512
unsigned int srv_samples_window = TIME_STATS_SAMPLES;

在模式 http srv_sample_counter是 sv->counters.p.http.cum_req . http.cum_req定义为 ST_F_REQ_TOT .

[ST_F_REQ_TOT]  = { .name = "req_tot",  .desc = "Total number of HTTP requests processed by this object since the worker process started" },

例如，如果 http.cum_req 的值是 10，那么 srv_sample_counter将是 10。示例似乎是给定后端服务器的给定示例窗口的成功请求数。 d_time (数据时间)作为“总和”传递并计算为某个非负值，或者被计为错误。我以为我找到了代码 d_time已创建但我不确定所以我没有包含它。

/* Returns the average sample value for the sum <sum> over a sliding window of
 * <n> samples. Better if <n> is a power of two. It must be the same <n> as the
 * one used above in all additions.
 */
static inline unsigned int swrate_avg(unsigned int sum, unsigned int n)
{
    return (sum + n - 1) / n;
}

关于haproxy 统计 : qtime, ctime,time,time?，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48756735/

haproxy 统计 : qtime, ctime,time,time?

上一篇：angular - 在 Angular 的嵌套路由器导出中动态加载模块

下一篇：python - 如何使用 Python 访问受 IAP 保护的资源？