sql-server - 在 SQL 数据仓库中创建/选择外部表时处理嵌入的新行

在 SQL 数据仓库中(编辑者请不要更改此名称，它是实际名称，请参阅: here )我有一个 JobCandidate_ext 外部表，如下所示。

CREATE EXTERNAL TABLE [HumanResources].[JobCandidate_ext](
    [JobCandidateID] int,
    [BusinessEntityID] int,
    [Resume] Varchar(8000),
    [ModifiedDate] Datetime
)
WITH (
    LOCATION='/[HumanResources].[JobCandidate]/data.txt',
    DATA_SOURCE=AzureStorage,
    FILE_FORMAT=TextFile)
GO

[Resume] 列在 SQL Server 中是 XML 类型，但在 SQL 数据仓库中 XML 类型应转换为 varchar(8000) 如所述 here .

我使用平面文件 data.txt 将数据导出到 blob，然后从中创建外部表。

[Resume] 列中包含回车符(与 XML 文件中的预期相同)，因此当您运行 SELECT * FROM [HumanResources].[JobCandidate_ext] 你会得到一个错误。在这种情况下:

Query aborted-- the maximum reject threshold (0 rows) was reached while reading from an external source: 1 rows rejected out of total 2 rows processed.
(/[HumanResources].[JobCandidate]/data.txt)Column ordinal: 0, Expected data type: INT, Offending value: some text .... (Column Conversion Error), Error: Error converting data type NVARCHAR to INT.

我知道在创建外部表时无法配置行分隔符，如here所述。 .

The row delimiter must be UTF-8 and supported by Hadoop’s LineRecordReader. The row delimiter must be either '\r', '\n', or '\r\n'. These are not user-configurable.

如果您尝试在每个列字段上添加引号，则在从外部表中选择行时会出现此错误:无结束字符串分隔符。

Query aborted-- the maximum reject threshold (0 rows) was reached while reading from an external source: 1 rows rejected out of total 1 rows processed.
(/[HumanResources].[JobCandidate]/data.txt)Column ordinal: 2, Expected data type: VARCHAR(8000) collate SQL_Latin1_General_CP1_CI_AS, Offending value: 'ShaiBassli (Tokenization failed), Error: No closing string delimiter.

有办法解决这个问题吗？

最佳答案

如今，PolyBase 不允许在字段内使用行或字段分隔符，即它不允许您转义这些字符。正如 Greg 指出的，您可以在这里投票支持此功能:https://feedback.azure.com/forums/307516-sql-data-warehouse/suggestions/10600132-polybase-allow-line-ends-within-qualified-text-f

要解决此限制，您可以在使用 PolyBase 读取数据之前对数据进行预处理(例如使用 sed 或 tr)以替换不需要的字符。或者您可以切换到其他 Polybase 支持的文件格式 RCFile/ORC/Parquet，以避免完全处理行和字段分隔符。

关于sql-server - 在 SQL 数据仓库中创建/选择外部表时处理嵌入的新行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36137241/

sql-server - 在 SQL 数据仓库中创建/选择外部表时处理嵌入的新行

上一篇：sql-server - 如何在 SQL SERVER 中自动增加合并表不匹配大小写的列？

下一篇：sql - 根据刺字 MS SQL Server 选择