azure-data-lake - u-sql: filtering out empty//Null strings(微软学术图)

标签 azure-data-lake u-sql

我是 Azure 数据湖分析的 u-sql 的新手。 我想做一个我认为很简单的操作但遇到了麻烦。 基本上:我想创建一个忽略空字符串的查询。 在 select 中使用它可以工作,但不能在 WHERE 语句中使用。

在我所做的陈述和我得到的神秘错误下面

工作

@xsel_res_1 = 
EXTRACT 
x_paper_id  long,
x_Rank  uint,
x_doi   string,
x_doc_type  string,
x_paper_title   string,
x_original_title    string,
x_book_title    string,
x_paper_year    int,
x_paper_date    DateTime?,
x_publisher string,
x_journal_id    long?,
x_conference_series_id  long?,
x_conference_instance_id    long?,
x_volume    string,
x_issue string,
x_first_page    string,
x_last_page string,
x_reference_count   long,
x_citation_count    long?,
x_estimated_citation    int?
FROM @"adl://xmag.azuredatalakestore.net/graph/2018-02-02/Papers.txt"
USING Extractors.Tsv()
; 

@xsel_res_2 = 
SELECT 
x_paper_id        AS x_paper_id,
x_doi.ToLower()   AS x_doi,
x_doi.Length     AS x_doi_length
FROM @xsel_res_1
WHERE NOT string.IsNullOrEmpty(x_doi)
;

@xsel_res_3 = 
SELECT 
* 
FROM @xsel_res_2
SAMPLE ANY (5)
;

OUTPUT @xsel_res_3
TO @"/graph/2018-02-02/x_output/x_papers_x6.tsv"
USING Outputters.Tsv();

错误

Vertex failed
Vertex failure triggered quick job abort. Vertex failed: SV1_Extract[0][1]             with error: Vertex user code error.
VertexFailedFast: Vertex failed with a fail-fast error

E_RUNTIME_USER_EXTRACT_ROW_ERROR: Error occurred while extracting row    after processing 10 record(s) in the vertex' input split. Column index: 5, column name: 'x_original_title'.

E_RUNTIME_USER_EXTRACT_EXTRACT_INVALID_CHARACTER_AFTER_QUOTED_FIELD:     Invalid character following the ending quote character in a quoted field.

Row selected
Component
RUNTIME
Message
Invalid character following the ending quote character in a quoted field.
Resolution

Column should be fully surrounded with double-quotes and double-quotes within the field escaped as two double-quotes.

Description
Invalid character is detected following the ending quote character in a quoted field. A column delimiter, row delimiter or EOF is expected. This error can occur if double-quotes within the field are not correctly escaped as two double-quotes.
Details

Row Delimiter: 0x0
Column Delimiter: 0x9
HEX: 61 76 6E 69 20 74 65 72 6D 69 6E 20 75 20 70 6F 76 61 6C 6A 73 6B 6F 6A 20 6C 69 73 74 69 6E 69 20 69 20 6E 61 74 70 69 73 75 20 67 20 31 31 38 35 09 22 50 6F 20 6B 6F 6E 63 75 22 ### 20 28 73 74 61 72 69 20 68 72

更新 顺便说一句,这些操作适用于其他数据集,所以据我所知,问题不在于语法

 //Define schema of file, must map all columns
 @searchlog = 
 EXTRACT UserId          int, 
        Start           DateTime, 
        Region          string, 
        Query           string, 
        Duration        int, 
        Urls            string, 
        ClickedUrls     string
FROM @"/Samples/Data/SearchLog.tsv"
USING Extractors.Tsv();


 @searchlog_1 =
 SELECT * FROM  @searchlog
 WHERE NOT string.IsNullOrEmpty(ClickedUrls );


 OUTPUT @searchlog_1
   TO @"/Samples/Output/SearchLog_output_x1.tsv"
    USING Outputters.Tsv();

最佳答案

对于这种情况,这是一个不幸的错误显示。

假设文本是 utf-8,您可以使用像 www.hexutf8.com 这样的网站将十六进制转换为:

avni termin u povaljskoj listini natpisu g 1185 "Po koncu" (Stari hr

看起来输入行包含至少一个未正确转义的 " 字符。它应该如下所示:

avni termin u povaljskoj listini natpisu g 1185 ""Po koncu"" (Stari hr

关于azure-data-lake - u-sql: filtering out empty//Null strings(微软学术图),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49199983/

相关文章:

python - 如何将 parquet 文件上传到 Azure ADLS 2 Blob

azure - U-SQL 提取语句 - 处理数百列

r - 无法使用 R 扩展在 U-SQL 上执行 R 代码

u-sql - U-SQL 中的类型转换

azure-data-lake - 当作业准备时间超过 25 分钟时,数据湖作业失败

azure - Azure Data Lake 中的增量负载

从 TXT 文件中提取 U-SQL 错误

c# - USQL 执行缓慢

azure - 如何通过 Python API 从 azure 数据湖高效下载整个目录?

.net - SDK更新后U-SQL脚本无法编译