我是Hadoop新手,对参数有疑问: 对于字数统计示例,请参见下面的代码片段:
public static class TokenizerMapper
extends Mapper<LongWritable, Text, Text, IntWritable> {
.....
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
.......
}
}
我知道“value”参数是从文件中读取的行,但是“key”参数是什么意思?对应什么?
为什么它的类型是LongWritable?
我通过搜索文档浪费了几个小时,有人能帮忙吗?
最佳答案
键是 LongWritable
类型,因为 wordcount 程序将输入作为 TextInputFormat
根据 JavDoc对于 TextInputFormat
An InputFormat for plain text files. Files are broken into lines. Either linefeed or carriage-return are used to signal end of line. Keys are the position in the file, and values are the line of text..
根据定义,假设您的文字是
We are fine.
How are you?
All are fine.
然后映射器的输入是
键:1
值:我们很好。
键:14
值:How are you?
(第一行包括换行符在内大约有 13 个字符,因此行位置为 14)
Key:28
Value:All are fine.
(第二行还有大约 13 个字符,包括换行符,所以从文件开始的行位置是 28)
关于Hadoop Mapper参数含义,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49124302/