hadoop - 使用Pig存储在Hbase中时出错

标签 hadoop hbase apache-pig store hortonworks-data-platform

hadoop dfs输入数据猫:

[ituser1@genome-dev3 ~]$ hadoop fs -cat FOR_COPY/COMPETITOR_BROKERING/part-r-00000 | head -1

返回:
836646827,1000.0,2016-02-20,34,CAPITAL BOOK,POS/CAPITAL BOOK/NEW DELHI/200216/14:18,BOOKS AND STATIONERY,5497519004453567/41043516,MARRIED,M,SALARIED,D,5942,1

我的 pig 代码:
DATA = LOAD 'FOR_COPY/COMPETITOR_BROKERING' USING PigStorage(',') AS (CUST_ID:chararray,TXN_AMT:chararray,TXN_DATE:chararray,AGE_CASA:chararray,MERCH_NAME:chararray,TXN_PARTICULARS:chararray,MCC_CATEGORY:chararray,TXN_REMARKS:chararray,MARITAL_STATUS_CASA:chararray,GENDER_CASA:chararray,OCCUPATION_CAT_V2_NEW:chararray,DR_CR:chararray,MCC_CODE:chararray,OCCURANCE:int);

DATA_FIL = FOREACH DATA GENERATE                
                (chararray)CUST_ID AS CUST_ID,
                (chararray)TXN_AMT AS TXN_AMT,
                (chararray)TXN_DATE AS TXN_DATE,
                (chararray)AGE_CASA AS AGE_CASA,
                (chararray)MERCH_NAME AS MERCH_NAME,
                (chararray)TXN_PARTICULARS AS TXN_PARTICULARS,
                (chararray)MCC_CATEGORY AS MCC_CATEGORY,
                (chararray)TXN_REMARKS AS TXN_REMARKS,
                (chararray)MARITAL_STATUS_CASA AS MARITAL_STATUS_CASA,
                (chararray)GENDER_CASA AS GENDER_CASA,
                (chararray)OCCUPATION_CAT_V2_NEW AS OCCUPATION_CAT_V2_NEW,
                (chararray)DR_CR AS DR_CR,
                (chararray)MCC_CODE AS MCC_CODE;

STORE DATA_FIL INTO 'hbase://TXN_EVENTS' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage ('DETAILS:CUST_ID DETAILS:TXN_AMT DETAILS:TXN_DATE DETAILS:AGE_CASA DETAILS:MERCH_NAME DETAILS:TXN_PARTICULARS DETAILS:MCC_CATEGORY DETAILS:TXN_REMARKS DETAILS:MARITAL_STATUS_CASA DETAILS:GENDER_CASA DETAILS:OCCUPATION_CAT_V2_NEW DETAILS:DR_CR DETAILS:MCC_CODE');

但给出错误:
ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job job_1457792710587_0100 failed, hadoop does not return any error message

但是我的负载工作正常:
HDATA = LOAD 'hbase://TXN_EVENTS'
       USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
       'DETAILS:CUST_ID DETAILS:TXN_AMT DETAILS:TXN_DATE DETAILS:AGE_CASA DETAILS:MERCH_NAME DETAILS:TXN_PARTICULARS DETAILS:MCC_CATEGORY DETAILS:TXN_REMARKS DETAILS:MARITAL_STATUS_CASA DETAILS:GENDER_CASA DETAILS:OCCUPATION_CAT_V2_NEW DETAILS:DR_CR DETAILS:MCC_CODE','-loadKey true' )
       AS (ROWKEY:chararray,CUST_ID:chararray,TXN_AMT:chararray,TXN_DATE:chararray,AGE_CASA:chararray,MERCH_NAME:chararray,TXN_PARTICULARS:chararray,MCC_CATEGORY:chararray,TXN_REMARKS:chararray,MARITAL_STATUS_CASA:chararray,GENDER_CASA:chararray,OCCUPATION_CAT_V2_NEW:chararray,DR_CR:chararray,MCC_CODE:chararray);

DUMP HDATA; (这给出了完美的结果):
2016-03-01,1,20.0,2016-03-22,27,test_merch,test/particulars,test_category,test/remarks,married,M,service,D,1234

感谢帮助

我在分布式模式下使用Horton堆栈:

HDP2.3
Apache Pig版本0.15.0
HBase 1.1.1

当我通过Ambari安装它们时,所有 jar 也都就位。

最佳答案

解决了数据上传:

由于我缺少对关系进行排名,因此hbase rowkey成为排名。\

DATA_FIL_1 =排名DATA_FIL_2;

注意:这将生成任意行键。

但是,如果您想定义行键,则使用类似:

您必须指定另一个关系,只有STORE函数不起作用。
这将把第一个元组作为rowkey(已定义)

storage_data =通过org.apache.pig.backend.hadoop.hbase.HBaseStorage('event_data:CUST_ID event_data:EVENT transaction_data:TXN_AMT transaction_data:TXN_DATECHS_NAME: transaction_data:TXN_PARTICULARS transaction_data:MCC_CATEGORY transaction_data:TXN_REMARKS transaction_data:MARITAL_STATUS_CASA transaction_data:GENDER_CASA transaction_data:OCCUPATION_CAT_V2_NEW transaction_data:DR_CR transaction_data:MCC_CODE');

关于hadoop - 使用Pig存储在Hbase中时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36152391/

相关文章:

scala - 烫伤教程: HDFS rsync errors

hadoop - 如何在HIVE分区中重命名文件

java - 每行的词袋

amazon-s3 - AWS EC2 上的 Hbase

hadoop - 学习像在PIG Latin中使用正则表达式一样使用perl。

hadoop - 从包中提取有序的元组值

java - Hadoop:Reduce 阶段开始:FileNotFoundException 输出/file.out.index 不存在

hadoop - 使用 Hadoop 处理来自多个数据源的数据

hadoop - HBase 如何在整个集群中分配来自 MapReduce 的新区域?

hadoop - Hadoop和PiggyBank不兼容