我在Hive中创建了一个表,并从外部csv文件加载了数据。当我尝试从python打印数据时,得到类似“['\ x00” \ x00m \ x00e \ x00s \ x00s \ x00a \ x00g \ x00e \ x00“\ x00']”的输出。当我查询Hive GUI时,结果是正确的。请告诉我如何通过python程序获得相同的结果。
我的python代码:
import pyhs2
with pyhs2.connect(host='192.168.56.101',
port=10000,
authMechanism='PLAIN',
user='hiveuser',
password='password',
database='anuvrat') as conn:
with conn.cursor() as cur:
cur.execute('SELECT message FROM ABC_NEWS LIMIT 5')
print cur.fetchone()
输出为:
/usr/bin/python2.7 /home/anuvrattiku/SPRING_2017/CMPE239/Facebook_Fake_news_detection/code_fake_news/code.py
['\x00"\x00m\x00e\x00s\x00s\x00a\x00g\x00e\x00"\x00']
Process finished with exit code 0
当我在Hive中查询同一张表时,得到以下输出:
这就是我创建表的方式:
CREATE TABLE ABC_NEWS(
ID STRING,
PAGE_ID INT,
NAME STRING,
MESSAGE STRING,
DESCRIPTION STRING,
CAPTION STRING,
POST_TYPE STRING,
STATUS_TYPE STRING,
LIKES_COUNT SMALLINT,
COMMENTS SMALLINT,
SHARES_COUNT SMALLINT,
LOVE_COUNT SMALLINT,
WOW_COUNT SMALLINT,
HAHA_COUNT SMALLINT,
SAD_COUNT SMALLINT,
THANKFUL_COUNT SMALLINT,
ANGRY_COUNT SMALLINT,
LINK STRING,
IMAGE_LINK STRING,
POSTED_AT STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "," ESCAPED BY '\\';
用于加载表格的csv文件位于以下路径中:
https://www.dropbox.com/s/fiwygyqt8u9eo5s/abc-news-86680728811.csv?dl=0
最佳答案
"
),并且在合格的文本内出现定界符(,
),所以您应该使用CSV Serde cur.fetchone()
,它是一个列表而不是字符串,因此得到了一个字节数组,而您应该已经打印了列表的第一个元素-cur.fetchone()[0]
create external table abc_news
(
id string
,page_id int
,name string
,message string
,description string
,caption string
,post_type string
,status_type string
,likes_count smallint
,comments smallint
,shares_count smallint
,love_count smallint
,wow_count smallint
,haha_count smallint
,sad_count smallint
,thankful_count smallint
,angry_count smallint
,link string
,image_link string
,posted_at string
)
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with serdeproperties
(
'separatorChar' = ','
,'quoteChar' = '"'
)
stored as textfile
;
>>> import pyhs2
>>>
>>> with pyhs2.connect(host='localhost',port=10000,authMechanism='PLAIN',user='cloudera',password='cloudera',database='local_db') as conn:
... with conn.cursor() as cur:
... cur.execute('SELECT message FROM ABC_NEWS LIMIT 10')
... for i in cur.fetch():
... print i[0]
...
...
...
"message"
"Roberts took the unusual step of devoting the majority of his annual report to the issue of judicial ethics."
"Do you agree with the new law?"
"Some pretty cool confetti will rain down on New York City celebrators."
NULL
"The pharmacy was held up by a man seeking prescription medication. "
NULL
"There were no immediate reports of damage or injuries."
"Were you an LCD screen early adopter? A settlement may be headed your way."
"As Americans get bigger, passenger limits are becoming more restrictive."
>>>
关于python - 来自python程序的配置单元查询返回的输出类似于 “x00e\x00”\x00“,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43712292/