java - MRUnit reducer 测试: Mismatch in value class

标签 java hadoop mapreduce reduce mrunit

迁移到 MapReduce 2 后,我的 reducer 单元测试抛出“值类不匹配” 异常:

值类不匹配:预期:类 org.apache.hadoop.io.IntWritable 实际:类 com.company.MyWritable

错误消息本身对我来说很清楚,但我不明白为什么 MRUnit 获取临时可写类而不是 IntWritable。

reducer 实现:

public static class TestCountReduce extends
        Reducer<Text, MyWritable, Text, IntWritable> {

    public void reduce(Text key, Iterator<MyWritable> values,
            Context context) throws IOException, InterruptedException {

        ...
        context.write(key, new IntWritable(s.size()));
    }
}

测试设置:

public void setUp() throws IOException {
    Mapper<Object, Text, Text, MyWritable> mapper = new MyMapper();
    Reducer<Text, MyWritable, Text, IntWritable> reducer = new MyReducer();

    mapDriver = new MapDriver<Object, Text, Text, MyWritable>();
    mapDriver.setMapper(mapper);

    reduceDriver = new ReduceDriver<Text, MyWritable, Text, IntWritable>();
    reduceDriver.setReducer(reducer);
}

最后是测试用例:

@Test
public void testReducer() throws IOException {
    List<MyWritable> values = new ArrayList<MyWritable>();
    values.add(new MyWritable("1"));
    values.add(new MyWritable("1"));
    reduceDriver.withInput(new Text("testkey"), values);
    reduceDriver.withOutput(new Text("testkey"), new IntWritable(1));
    reduceDriver.runTest();
}

最佳答案

请检查您的reducer实现中的reduce方法签名

应该是

public void reduce(Text key, Iterable<MyWritable> values, Context context) throws IOException, InterruptedException {

而不是

public void reduce(Text key, Iterator<MyWritable> values, Context context) throws IOException, InterruptedException {

关于java - MRUnit reducer 测试: Mismatch in value class,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24293018/

相关文章:

Javafx:如何在 setRowFactory 的一行中引用特定的单元格?

java - 有没有更好的方法在 Java 中进行空检查?

java - 在 Windows 中使用 Java 网络摄像头界面进行人脸检测

hadoop - 更改 CDH5 Kerberos 身份验证后,我无法访问 hdfs

java - 在 MapReduce 中处理用户输入字符串

java - JAVA免root解析JSON

hadoop - Oozie-分区表的配置单元操作失败

hadoop - dijkstra的最短路径算法回溯了吗?

sorting - 如何在 Hive 中对文件进行重复数据删除并保持原始排序顺序?

windows - 在 yarn cluster (linux) : Error no sheme for Filesystem "C" 上从客户端 (windows) 执行 spark