hadoop - hive UDF-将StringObjectInspector转换为String

标签 hadoop hive hiveql user-defined-functions hadoop2

我正在编写通用UDF。如果我直接使用UDF,它可以工作,但是,如果我将UDF与其他函数(distinct,max,min)一起使用,它甚至都不会调用evaluate函数。

我想看看发生了什么,因此尝试记录这些值。但是需要了解如何将StringObjectInspector转换为String

代码

@Description(name = "Decrypt", value = "Decrypt the Given Column", extended = "SELECT Decrypt('Hello World!');")
public class Decrypt extends GenericUDF {
    Logger logger = Logger.getLogger(getClass().getName());

    PrimitiveObjectInspector col;
    StringObjectInspector databaseName;
    StringObjectInspector schemaName;
    StringObjectInspector tableName;
    StringObjectInspector colName;

    @Override
    public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {
        System.out.println("******************************      initialize called    ******************************");
        logger.info("******************************      initialize called    ******************************");
        if (arguments.length != 5) {
            throw new UDFArgumentLengthException("Decrypt only takes 4 arguments: T, String, String, String");
        }

        ObjectInspector colObject = arguments[0];
        ObjectInspector databaseNameObject = arguments[1];
        ObjectInspector schemaNameObject = arguments[2];
        ObjectInspector tableNameObject = arguments[3];
        ObjectInspector colNameNameObject = arguments[4];


        if (    !(databaseNameObject instanceof StringObjectInspector) ||
                !(schemaNameObject instanceof StringObjectInspector) ||
                !(tableNameObject instanceof StringObjectInspector) ||
                !(colNameNameObject instanceof StringObjectInspector)
        ) {
            throw new UDFArgumentException("Error: databaseName, schemeName, tableName and ColName should be String");
        }

        this.col = (PrimitiveObjectInspector) colObject;
        this.databaseName = (StringObjectInspector) databaseNameObject;
        this.tableName = (StringObjectInspector) tableNameObject;
        this.schemaName = (StringObjectInspector) schemaNameObject;
        this.colName = (StringObjectInspector) colNameNameObject;

        logger.info("******************************      initialize end    ******************************");
        logger.info(col.toString());
        logger.info(col);
        logger.info(databaseNameObject.toString());
        logger.info(databaseNameObject);
        logger.info(colName.toString());
        logger.info(colName);
        logger.info(colNameNameObject);
        logger.info(colNameNameObject.toString());
        return PrimitiveObjectInspectorFactory.javaStringObjectInspector;
    }

    @Override
    public Object evaluate(DeferredObject[] deferredObjects) throws HiveException {
        System.out.println("******************** Decrypt ********************");
        logger.info("******************** Decrypt ******************** ");
        if(col.getPrimitiveJavaObject(deferredObjects[0].get()) == null){
            return null;
        }
        String stringToDecrypt = col.getPrimitiveJavaObject(deferredObjects[0].get()).toString();
        String database = databaseName.getPrimitiveJavaObject(deferredObjects[1].get());
        String schema = schemaName.getPrimitiveJavaObject(deferredObjects[2].get());
        String table = tableName.getPrimitiveJavaObject(deferredObjects[3].get());
        String col = colName.getPrimitiveJavaObject(deferredObjects[4].get());

        return new Text(AES.decrypt(stringToDecrypt, database, schema, table, col));
    }

    @Override
    public String getDisplayString(String[] strings) {
        return null;
    }

}

最佳答案

尝试使用getPrimitiveJavaObject方法代替toStringmore details

关于您的问题的另一种想法,请查看优化标志:

  • 向量化:hive.vectorized.executionhive.vectorized.execution.enabledhive.vectorized.execution.reduce.groupby.enabled
  • 基于成本的优化:hive.cbo.enable
  • 谓词下推:hive.optimize.ppd

  • 通过在配置单元 shell 中键入set <option>(例如set hive.optimize.ppd;),检查是否启用/禁用了这些标志,然后尝试切换该值。

    关于hadoop - hive UDF-将StringObjectInspector转换为String,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62184519/

    相关文章:

    hadoop - MRv1(mapreduce)和MRv2(YARN)的 “Wordcount”程序是否有所不同

    java - 安装后 Hadoop 示例无法运行

    Python 连接到 Hive

    hive - 控制 Hive 中的详细程度

    hadoop - 将参数传递给配置单元查询

    mysql - 为什么 Distinct * 不起作用但 count(Distinct *) 起作用?

    c# - 提交 C# MapReduce 作业 Windows Azure HDInsight - 响应状态代码不表示成功 : 500 (Server Error)

    hadoop - 任何好的开源分析前端工具?

    hadoop - Hive静态分区问题

    hive - 在配置单元中插入覆盖无法正常工作