java - 在 MapReduce 中,如何将数组列表作为值从映射器发送到 reducer

标签 java hadoop arraylist mapreduce

这个问题在这里已经有了答案:





Output a list from a Hadoop Map Reduce job using custom writable

(1 个回答)


6年前关闭。




我们如何将数组列表作为值从映射器传递给 reducer 。

我的代码基本上有一定的规则可以使用,并会根据规则创建新的值(字符串)。我在一个列表中维护所有输出(在规则执行后生成),现在需要将此输出(映射器值)发送到Reducer 并没有办法这样做。

有人能指点我一个方向吗

添加代码

package develop;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import java.util.ArrayList;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs;

import utility.RulesExtractionUtility;

public class CustomMap{


    public static class CustomerMapper extends Mapper<Object, Text, Text, Text> {
        private Map<String, String> rules;
        @Override
        public void setup(Context context)
        {

            try
            {
                URI[] cacheFiles = context.getCacheFiles();
                setupRulesMap(cacheFiles[0].toString());
            }
            catch (IOException ioe)
            {
                System.err.println("Error reading state file.");
                System.exit(1);
            }

        }

        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

//          Map<String, String> rules = new LinkedHashMap<String, String>();
//          rules.put("targetcolumn[1]", "ASSIGN(source[0])");
//          rules.put("targetcolumn[2]", "INCOME(source[2]+source[3])");
//          rules.put("targetcolumn[3]", "ASSIGN(source[1]");

//          Above is the "rules", which would basically create some list values from source file

            String [] splitSource = value.toString().split(" ");

            List<String>lists=RulesExtractionUtility.rulesEngineExecutor(splitSource,rules);

//          lists would have values like (name, age) for each line from a huge text file, which is what i want to write in context and pass it to the reducer.
//          As of now i havent implemented the reducer code, as m stuck with passing the value from mapper.

//          context.write(new Text(), lists);---- I do not have a way of doing this


        }




        private void setupRulesMap(String filename) throws IOException
        {
            Map<String, String> rule = new LinkedHashMap<String, String>();
            BufferedReader reader = new BufferedReader(new FileReader(filename));
            String line = reader.readLine();
            while (line != null)
            {
                String[] split = line.split("=");
                rule.put(split[0], split[1]);
                line = reader.readLine();

                // rules logic
            }
            rules = rule;
        }
    }
    public static void main(String[] args) throws IllegalArgumentException, IOException, ClassNotFoundException, InterruptedException, URISyntaxException {


    Configuration conf = new Configuration();
    if (args.length != 2) {
        System.err.println("Usage: customerMapper <in> <out>");
        System.exit(2);
    }
    Job job = Job.getInstance(conf);
    job.setJarByClass(CustomMap.class);
    job.setMapperClass(CustomerMapper.class);
    job.addCacheFile(new URI("Some HDFS location"));


    URI[] cacheFiles= job.getCacheFiles();
    if(cacheFiles != null) {
        for (URI cacheFile : cacheFiles) {
            System.out.println("Cache file ->" + cacheFile);
        }
    }
    // job.setReducerClass(Reducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

最佳答案

要将 arraylist 从 mapper 传递给 reducer,很明显对象必须实现 Writable 接口(interface)。你为什么不试试这个图书馆?

<dependency>
    <groupId>org.apache.giraph</groupId>
    <artifactId>giraph-core</artifactId>
    <version>1.1.0-hadoop2</version>
</dependency>

它有一个抽象类:
public abstract class ArrayListWritable<M extends org.apache.hadoop.io.Writable>
extends ArrayList<M>
implements org.apache.hadoop.io.Writable, org.apache.hadoop.conf.Configurable

您可以创建自己的类和源代码来填充抽象方法并用您的代码实现接口(interface)方法。例如:
public class MyListWritable extends ArrayListWritable<Text>{
    ...
}

关于java - 在 MapReduce 中,如何将数组列表作为值从映射器发送到 reducer ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30945769/

相关文章:

java - 如何使用 Cargo Maven 插件部署在两个不同的 glassfish 域上?

java - java中对String[]数组列表进行排序

java - 使用 Java 读取多个属性 CouchDB

java - Java访问数据库的不同方式

hadoop - 如何从配置单元中的字符串解析日期?

hadoop - 如何从具有特定分区的配置单元中选择数据?

java - Servlet ArrayList 到 JSP(从对象到 ArrayList 的未经检查的转换)

java - 如何使用 for 每个循环访问另一个类中的 ArrayList

java - session 未创建异常 : session not created: This version of ChromeDriver only supports Chrome version 77 using Selenium ChromeDriver

hadoop - 色调表示资源管理器不可用错误,但运行良好