java - MapReduce-Java:计算数组列表的平均值

标签 java eclipse hadoop arraylist mapreduce

我已经为mapreduce分配了作业,并且在mapreduce编程中是一个新手。
我想计算每年和特定城市的平均值,最小值和最大值。
这是我的示例输入

Calgary,AB,2009-01-07,604680,12694,2.5207754,0.065721168,0.025668362,0.972051954,0.037000279,0.022319018,,,0.003641149,,,0.002936745,,,0.016723641 Calgary,AB,2009-12-30,604620,12694,2.051769654,0.060114973,0.034026918,1.503277516,0.054219005,0.023258217,,,0.00354166,,,0.003361414,,,0.122375131 Calgary,AB,2010-01-06,604680,12266,4.015745522,0.097792741,0.032738892,0.368454554,0.019228992,0.032882053,,,0.004778065,,,0.003190444,,,0.064203865 Calgary,AB,2010-01-13,604680,12551,3.006492921,0.09051656,0.041508534,0.215395047,0.012081755,0.023706119,,,0.004231772,,,0.003083003,,,0.155212503



我知道如何寻找城市和年份
我正在使用此代码:
String line = value.toString();
    String[] tokens = line.split(",");
    String[] date = tokens[2].split("-");
    String year = date[0];
    String location = tokens[0];

现在我想在每行中找到这两个数字(例如2.5207754,0.065721168,不完全相同,而是第三和第四逗号之后的所有数字),然后找到平均值,最小值和最大值。

在输出中应如下所示:

Calgary 2009 average: "" , min; "" , max: "" Calgary 2010 average: "" , min; "" , max: ""



我试图使用此代码来查找每一行中的值,但是由于每一行中的数据集都不相同,因此出现了错误(在该部分中没有数据或该长度更大的数据)
float number = 0;
    float number2 = 0 ;
    char a;
    char c;
    a = line.charAt(34);
    c = line.charAt(44);
    if (a == ',') 
    { 
        number = Float.parseFloat(line.substring(35, 44));
    }
    else 
    {
        number = Float.parseFloat(line.substring(35, 46));
    }

    if (c == ',')
    {
        number2 = Float.parseFloat(line.substring(45, 56));

    } else 
    {
        number = Float.parseFloat(line.substring(47, 58));
    }

    Text numbers = new Text(number + " " + number2 + " ");

然后,我尝试使用此代码,但与上面的代码相同,它不起作用:
String number = tokens[4];
String number2 = tokens[5];

你能帮我做这个项目吗?

最佳答案

查看您的输入,看来您的记录之间是用空格隔开的。您可以先使用“”分割,然后获取各个值并将其用于计算

        String[] arr = line.split(" ");
        for(String val : arr){
            String[] dataArr = val.split(",");
            String city  = dataArr[0];
            String date = dataArr[2];
            String v1 = dataArr[5];
            String v2 = dataArr[6];
            System.out.println("city: "+city +" date: "+ date +" v1: "+ v1+"v2: "+ v2);
        }

城市:卡尔加里日期:2009年1月7日v1:2.5207754v2:0.065721168
城市:卡尔加里日期:2009-12-30 v1:2.051769654v2:0.060114973
城市:卡尔加里日期:2010-01-06 v1:4.015745522v2:0.097792741
城市:卡尔加里日期:2010-01-13 v1:3.006492921v2:0.09051656
城市:卡尔加里日期:2009年1月7日v1:2.5207754v2:0.065721168

关于java - MapReduce-Java:计算数组列表的平均值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36534391/

相关文章:

java - 我如何在java中停止我的服务器

java - JPA 中的函数参数与 JAXB 类型不兼容

java - 如何从项目本身开始启动 Maven 模块中的主类?

hadoop - 降低值(value)(Hadoop)

java - Twitter4J,直接消息

java - 如何获取Eclipse插件开发的JavaDoc?

Eclipse Rename - 重构热键插入注册商标符号

java - 查看删除的控制台输出

hadoop - 'memory total' 在 yarn web UI 中的实际含义

hadoop - Hadoop和Informatica是否相关?