Hadoop 计数器文档?

标签 hadoop counter

<分区>

在我的 MapReduce 作业完成后,我得到了大量的 Counter 信息:

File System Counters
                FILE: Number of bytes read=4386096368
                FILE: Number of bytes written=8805370803
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=54583718086
                HDFS: Number of bytes written=4382090874
                HDFS: Number of read operations=1479
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=369
                Launched reduce tasks=1
                Data-local map tasks=369
                Total time spent by all maps in occupied slots (ms)=34288552
                Total time spent by all reduces in occupied slots (ms)=232084
                Total time spent by all map tasks (ms)=8572138
                Total time spent by all reduce tasks (ms)=58021
                Total vcore-seconds taken by all map tasks=8572138
                Total vcore-seconds taken by all reduce tasks=58021
                Total megabyte-seconds taken by all map tasks=35111477248
                Total megabyte-seconds taken by all reduce tasks=237654016
        Map-Reduce Framework
                Map input records=14753874
                Map output records=666776
                Map output bytes=4383426830
                Map output materialized bytes=4386098552
                Input split bytes=47970
                Combine input records=0
                Combine output records=0
                Reduce input groups=1
                Reduce shuffle bytes=4386098552
                Reduce input records=666776
                Reduce output records=666776
                Spilled Records=1333552
                Shuffled Maps =369
                Failed Shuffles=0
                Merged Map outputs=369
                GC time elapsed (ms)=1121584
                CPU time spent (ms)=23707900
                Physical memory (bytes) snapshot=152915259392
                Virtual memory (bytes) snapshot=2370755190784
                Total committed heap usage (bytes)=126644912128
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=49449743227
        File Output Format Counters
                Bytes Written=4382090874

我在哪里可以找到每个字段的含义的解释?其中一些相当明显(读取的字节数),但其他的则比较模糊(Total time by all maps in occupied slots vs Total time spent by all maps in occupied slots vs Total time spent by所有 map task )。

我找到了一个 list of all the default counters ,但我似乎找不到对它们的解释或描述。

令我感到相当惊讶的是,我似乎无法轻易找到有关此输出的文档。任何人都可以提供链接或解释吗?

最佳答案

第 8 章 Hadoop: The Definitive Guide (来自华盛顿州立大学的完整 PDF 链接)提供了与 MapReduce 相关的计数器的详细信息。这从第 225 页开始,列在表 8-1 中。此资源的最新版本(第 4 版)可在 Safari Books Online 获得。 (您需要先登录)。

关于Hadoop 计数器文档?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26023508/

相关文章:

hadoop - 为什么导出的 HBase 表比原始表大 4 倍?

hadoop - 有没有办法在 Hive HQL 中将两列相加到另一列?

javascript - 为什么我必须制作一个单独的变量来纠正这个循环? (JavaScript 基础知识)

php - 计数器不会在 PHP/MySQL 中递增

shell - 告诉 Impala 忽略错误并继续

hadoop - 使用 Sqoop 导出到 Postgresql ltree

hadoop - 如何在写入 hive orc 表时合并 spark 中的小文件

dictionary - 通过对相同键的值求和在 Julia 中添加字典

python - 从列表列表(如计数器)中计算每个组合的最快方法是什么?

PHP 到 Excel : fill the output Excel file with a loop