hadoop - Pig 的 UDF 错误 - 无法使用导入解决

标签 hadoop apache-pig cloudera

我将 Cloudera VM 与 Centos 一起使用,并设置了单个 Hadoop 集群。它使用 Eclipse Luna。

我有一个为与 Pig 一起使用而编写的 UDF。这是我第一次为 Pig 编写 UDF。以前的 Pig 脚本在没有 UDF 的情况下运行良好。当我运行这个 pig 脚本时,出现以下错误:

Failed to generate logical plan. Nested exception: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve EasyDates.EasyDateMethods.exec using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]

此错误发生在以“CALC_UR_DAYS_BETWEEN”开头的 pig 脚本中。见下文。

我花了 3-4 个小时在互联网上搜索(和测试),他们都指 - 正确设置类路径, - 确保你注册了你的 UDF, - 确保 jar 文件名与包名相同, - 确保包名称是工作路径中的一个目录,并且与包同名。

我已经完成了所有这些,但我仍然收到错误。

据我所知,所有内容都已正确命名且应位于:

  • Java 包名:EasyDates
  • jar 名称:EasyDates.jar
  • Jar 路径:/home/cloudera/data/EasyDates/
  • 类名:EasyDateMethods
  • 在 .bash_profile 中设置: CLASSPATH=$CLASSPATH:/usr/jars/:/home/cloudera/data/EasyDates/

几个小时后我已经用完了这些帖子。我找不到其他任何东西可以尝试。非常感谢任何其他见解!

Java 源代码:

package EasyDates;
import java.io.IOException;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;

import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.util.WrappedIOException;


public class EasyDateMethods extends EvalFunc <String> {

    public String exec(Tuple input) throws IOException {
        if (input == null || input.size() == 0)
            return "0";

        try{
            Date date1;
            Date date2;
            String strDiff="0";
            int intDiff = 0;
            //Get the two string dates from the tuple:
            String strDate1 = (String)input.get(0);
            String strDate2 = (String)input.get(1);
            //Convert them to Dates
            date1 = stringToDate(strDate1);
            date2 = stringToDate(strDate2);
            //The the date difference:
            intDiff = getDaysBetween(date1, date2);
            //Since I must return the same data type as I call for this Pig method, this converts the
            //difference in days to a string.
            return Integer.toString(intDiff);

        }catch(Exception e){
            throw WrappedIOException.wrap("Caught exception processing input row ", e);
        }

    }

    private Date stringToDate(String theDateString) {
        //Make sure the Pig script formats the date format this way or whatever format you choose.
        //Just make sure they agree.
        SimpleDateFormat dateFormatter = new SimpleDateFormat ( "dd-MMM-yyyy" );

        String dateInString = "12-May-2014";
        Date theDate;
        java.util.Date dateObject = null;

        try {

            dateObject = dateFormatter.parse ( theDateString );

            System.out.println( dateObject );
            System.out.println( dateFormatter.format ( dateObject ) );
            //theDate = dateFormatter.format ( dateObject );

        } catch ( Exception e) {

            System.out.println( e.getMessage() + " " + e.getStackTrace() );

        };
        return  dateObject ;

    }


    static int getDaysBetween(Date curDate, Date prevDate) {
        //Precondition:  the difference in days between the current meter read date and the last one is not known
        //Postcondition: the difference in days between the current meter read date and the last one is known
        Calendar currentDate = Calendar.getInstance();
        Calendar previousDate = Calendar.getInstance();
        currentDate.setTime(curDate);
        previousDate.setTime(prevDate);
        int theDiffinDays = 0;
        int theDiffinYears = 0;
        int currentDay;
        int previousDay;
        int currentYear;
        int previousYear;
        try {


            currentDay = currentDate.get(Calendar.DAY_OF_YEAR);
            System.out.println("currentDay is " + currentDay);
            previousDay = previousDate.get(Calendar.DAY_OF_YEAR);
            System.out.println("previousDay is " + previousDay);
            currentYear = currentDate.get(Calendar.YEAR);
            System.out.println("currentYear is " + currentYear);
            previousYear = previousDate.get(Calendar.YEAR);
            System.out.println("previousYear is " + previousYear);

            if (currentYear == previousYear) {
                theDiffinDays = currentDay - previousDay;
            }
            else
            {
                theDiffinYears = currentYear - previousYear;
                //This assumes 2 contiguous years, eg 2016 and 2017; so this wouldn't work if the diff in years is greater than 1
                if (isLeapYear(previousYear)) {
                    //The following has not been corrected for leap year:
                    //If the previous year is a leap year
                    theDiffinDays = 366 - previousDay + currentDay;
                }
                else {
                    //If the current year is a leap year or neither year is a leap year: (because the day of year should be inherent whether leap or not)
                    theDiffinDays = 365 - previousDay + currentDay;
                }
            }
            //return theDiffinDays;
        }
        catch (Exception ex){
            System.out.println(ex.getMessage() + " " + ex.getStackTrace());
        }
        return theDiffinDays;
    }

    private static boolean isLeapYear(int theYear){
        //Precondition:  the year is not designated as a leap year or not
        boolean ans = false;

        try {


            switch (theYear){
            case 2004: ans = true;
            break;
            case 2008: ans = true;
            break;
            case 2012: ans = true;
            break;
            case 2016: ans = true;
            break;
            case 2020: ans = true;
            break;
            case 2024: ans = true;
            break;
            case 2028: ans = true;
            break;
            case 2032: ans = true;
            break;
            case 2036: ans = true;
            break;
            case 2040: ans = true;
            break;
            case 2044: ans = true;
            break;
            case 2048: ans = true;
            break;
            default: ans = false;
            }
        }
        catch (Exception ex){
            System.out.println(ex.getMessage() + " " + ex.getStackTrace());

        }

        return ans;
    }

}

pig 脚本:

--Simple Pig script to read in a file with dates, and pass the dates to the EasyDate class 

REGISTER /home/cloudera/data/EasyDates/EasyDates.jar;
DEFINE DaysBetween EasyDates.EasyDateMethods;


----------------------------------------------------Load the file--------------------------------------------
--The file needs two different dates in one row for this test
devicePageCountAll = LOAD 'Data_For_Test_Jar.txt' USING PigStorage('\t')
                        AS (
                        account_code:chararray, 
                        serial_number:chararray,    
                        reported_date:chararray,
                        reported_date2:chararray);
--dump devicePageCountAll;

--------------------------------------------------Get the date difference in days and store the result-----------------

devicePageCountAll2 = foreach devicePageCountAll {

CALC_UR_DAYS_BETWEEN = DaysBetween((ToString(REPLACE(reported_date, '\\"', ''), 'yyyy-MM-dd')), (ToString(REPLACE(reported_date2, '\\"', ''), 'yyyy-MM-dd')));


                              generate 
                                        account_code, 
                                        serial_number,   
                                        reported_date,
                                        reported_date2,
                                        (CALC_UR_DAYS_BETWEEN > 15000 ? 0 : CALC_UR_DAYS_BETWEEN) AS days_since_last_reported;
                                        }
dump devicePageCountAll2;

谢谢!

最佳答案

代替这个

DEFINE DaysBetween EasyDates.EasyDateMethods;

尝试

DEFINE DaysBetween EasyDates.EasyDateMethods();

关于hadoop - Pig 的 UDF 错误 - 无法使用导入解决,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35854142/

相关文章:

hadoop - 无法在带有加载数据的Hive表中插入数据

hadoop - 将文件加载到Pig中并解压缩

hadoop - fs.defaultFS 只监听localhost的8020端口

java - 将 Nutch 编译成一个 Jar 文件(并运行它)的过程是什么?

hadoop - hdfs dfs -put : Exception in createBlockOutputStream and java. io.EOFException:过早的 EOF:没有可用的长度前缀

sql - 优雅的 HiveQL 查询

hadoop - 通过迭代数据包获取计数,但与该字段关联的每个值的条件计数应该不同

hadoop - 如何使用 PIG 在 Hadoop 中给定阈值进行连接

scala - Spark 提交成功运行,但通过 oozie 提交时无法连接到配置单元

centos - cloudera-manager-agent.rpm : does not update installed package error