我见过一些类似的问题,但由于问题并不完全相同,或者解决方案不适用于我的情况,所以我将问题发布在这里。
我正在解析一个表,该表在 csv_line
列中包含一个 csv 行。
问题是某些列有逗号 ,
这也是字段分隔符。这些列包含在引号中。
我正在做的解析是:
with
sample as (
select 'field1,field3,"http://another.domain/abc/...eIds=111,222,333,444,...,",CustomerX,end' as csv_line)
select
regexp_extract(csv_line,'(,?(".*?"|[^,]*)){1}') as f1
regexp_extract(csv_line,'(,?(".*?"|[^,]*)){n}') as fn
from raw_sample
我尝试替换字符/逗号。
我知道 OpenCSVSerde 允许在 Create 表中定义分隔符和转义双引号,但我可能正在寻找可以设置的属性,或者可能是正则表达式,可以以正确的方式进行拆分。
提前致谢
最佳答案
with raw_sample as (
select 'field1,field2,fiend3,123,456,"http://some.domain/abc/Player.aspx?playerID=111&BrowseIds=2221,423062611,423870887,424044345,...,",THIS_IS_MY,en,20 294 998 1001,end' as raw_line
)
select regexp_extract(raw_line,'(,?(".*?"|[^,]*)){01}',2) as c01
,regexp_extract(raw_line,'(,?(".*?"|[^,]*)){02}',2) as c02
,regexp_extract(raw_line,'(,?(".*?"|[^,]*)){03}',2) as c03
,regexp_extract(raw_line,'(,?(".*?"|[^,]*)){04}',2) as c04
,regexp_extract(raw_line,'(,?(".*?"|[^,]*)){05}',2) as c05
,regexp_extract(raw_line,'(,?(".*?"|[^,]*)){06}',2) as c06
,regexp_extract(raw_line,'(,?(".*?"|[^,]*)){07}',2) as c07
,regexp_extract(raw_line,'(,?(".*?"|[^,]*)){08}',2) as c08
,regexp_extract(raw_line,'(,?(".*?"|[^,]*)){09}',2) as c09
,regexp_extract(raw_line,'(,?(".*?"|[^,]*)){10}',2) as c10
from raw_sample
;
+--------+--------+--------+-----+-----+-----------------------------------------------------------------------------------------------------+------------+-----+-----------------+-----+
| c01 | c02 | c03 | c04 | c05 | c06 | c07 | c08 | c09 | c10 |
+--------+--------+--------+-----+-----+-----------------------------------------------------------------------------------------------------+------------+-----+-----------------+-----+
| field1 | field2 | fiend3 | 123 | 456 | "http://some.domain/abc/Player.aspx?playerID=111&BrowseIds=2221,423062611,423870887,424044345,...," | THIS_IS_MY | en | 20 294 998 1001 | end |
+--------+--------+--------+-----+-----+-----------------------------------------------------------------------------------------------------+------------+-----+-----------------+-----+
关于csv - HIVE - 手动解析数据,用双引号括起来,用逗号分隔,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44955720/