azure - 从 Azure 数据工厂中的 Json 文件中提取/删除/跳过自由文本

标签 azure data-science azure-data-factory

我正在尝试在 ADF 中进行一些源转换,其中有很多采用以下格式的服务器日志:

#PartnerName    QA Server
#ApplicationName    T_GSPClient
#AccountName    DoNotModifyDMS
#SDK    desktop
#ClientVersion  5.1.1894.3
#InputChannel   DesktopMic
#User   JohnDoe
#NmsLogin   JohnDoe
#SessionId  7ba732d6-3445-4b16-b7e8-345fgd4f5g4
#ClientIP   209.122.69.109
#SRTechnology   S2
#SROptions  NoTextBefore
#GeneralLogLevel    Trace
#ModuleLogLevels    
#ServerDateTimeUTC  2023-07-06 15:28:33.105
{"date":"2023-07-06\t15:28:09.653","level":"TRACE","msg":"DMVAServerMessage-Initialize:{"dmhMessage":{"messageHeader":{"messageType":"unsubscribe","sessionId":"00000000-0000-0000-0000-000000000000","messageId":"aeb65e55-a7c0-4e96-a960-dc2a252d6b2c","transactionType":"acknowledgement","clientType":"ttsChannel","version":"1.0","application":"DesktopSDK"},"messageResponse":{"resultCode":"SERVER_ERROR","errorMessage":"The subscription id does not exist, unsubscribe did not remove an entry"}}} ","traceId":"409e0d44-ad50-4f17-84c7-0521e01e11fc","spanId":"e0750e44-ad50-4f17-90a9-3b6940e0294b","resource":{"module":".NET","class":"Nuance.SpeechAnywhere.Internal.DMVA.DMVAServerMessage","function":"Initialize","line":70,"pid":26612,"thread":"[28-27280]"}}
{"date":"2023-07-06\t15:28:09.653","level":"TRACE","msg":"DMVAServerMessage-Initialize:{"dmhMessage":{"messageHeader":{"messageType":"unsubscribe","sessionId":"00000000-0000-0000-0000-000000000000","messageId":"aeb65e55-a7c0-4e96-a960-dc2a252d6b2c","transactionType":"acknowledgement","clientType":"ttsChannel","version":"1.0","application":"DesktopSDK"},"messageResponse":{"resultCode":"SERVER_ERROR","errorMessage":"The subscription id does not exist, unsubscribe did not remove an entry"}}} ","traceId":"409e0d44-ad50-4f17-84c7-0521e01e11fc","spanId":"e0750e44-ad50-4f17-90a9-3b6940e0294b","resource":{"module":".NET","class":"Nuance.SpeechAnywhere.Internal.DMVA.DMVAServerMessage","function":"Initialize","line":70,"pid":26612,"thread":"[28-27280]"}}
{"date":"2023-07-06\t15:28:09.653","level":"TRACE","msg":"DMVAServerMessage-Initialize:{"dmhMessage":{"messageHeader":{"messageType":"unsubscribe","sessionId":"00000000-0000-0000-0000-000000000000","messageId":"aeb65e55-a7c0-4e96-a960-dc2a252d6b2c","transactionType":"acknowledgement","clientType":"ttsChannel","version":"1.0","application":"DesktopSDK"},"messageResponse":{"resultCode":"SERVER_ERROR","errorMessage":"The subscription id does not exist, unsubscribe did not remove an entry"}}} ","traceId":"409e0d44-ad50-4f17-84c7-0521e01e11fc","spanId":"e0750e44-ad50-4f17-90a9-3b6940e0294b","resource":{"module":".NET","class":"Nuance.SpeechAnywhere.Internal.DMVA.DMVAServerMessage","function":"Initialize","line":70,"pid":26612,"thread":"[28-27280]"}}
{"date":"2023-07-06\t15:28:09.653","level":"TRACE","msg":"DMVAServerMessage-Initialize:{"dmhMessage":{"messageHeader":{"messageType":"unsubscribe","sessionId":"00000000-0000-0000-0000-000000000000","messageId":"aeb65e55-a7c0-4e96-a960-dc2a252d6b2c","transactionType":"acknowledgement","clientType":"ttsChannel","version":"1.0","application":"DesktopSDK"},"messageResponse":{"resultCode":"SERVER_ERROR","errorMessage":"The subscription id does not exist, unsubscribe did not remove an entry"}}} ","traceId":"409e0d44-ad50-4f17-84c7-0521e01e11fc","spanId":"e0750e44-ad50-4f17-90a9-3b6940e0294b","resource":{"module":".NET","class":"Nuance.SpeechAnywhere.Internal.DMVA.DMVAServerMessage","function":"Initialize","line":70,"pid":26612,"thread":"[28-27280]"}}

我的目标是删除/跳过第一个自由文本行并保留其他 json 数据,然后将其移动到另一个 blob 进行进一步转换。我尝试使用派生列进行源转换,但我的数据流仍然显示 json 格式错误。我还通过 ForEach 使用了“复制数据”事件和“enableSkipInknownRow”:true,但它不起作用。它仅在我只使用单个文件时才有效,而当我尝试迭代多个文件并跳过/删除这些行时则无效。

最佳答案

为了使用 ADF 数据流删除前几行非 Json 格式的行,请按照以下方法操作。

  • 使用分隔的源数据集进行源转换。取不属于文本一部分的任何列分隔符。

enter image description here

  • 在这里,我使用波形符 ~ 作为列分隔符。因此,整行位于同一列中。

enter image description here

  • 然后进行过滤器转换,并根据条件指定过滤器,substring(Column_name,1,1)=='{'。这将删除所有不以 { 开头的行。

enter image description here

  • 然后进行接收器转换并将文件名选项设置为输出到单个文件并给出文件名。在优化中,选择单个分区

enter image description here

在这里,我将接收器文件作为分隔数据集,其中列分隔符为无分隔符,引号字符为无引号字符

enter image description here

输出文件中的数据

enter image description here

关于azure - 从 Azure 数据工厂中的 Json 文件中提取/删除/跳过自由文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76660184/

相关文章:

c# - 获取 Azure blob 容器

Azure 数据工厂仅从 Blob 存储复制数据 新添加的文件

Azure 文件存储在单个请求中递归获取所有文件

machine-learning - 如何使用 svm 预测多标签数据集

java - 生成的多项式回归值距坐标太远

python - 根据第二个数据帧的匹配列更新 pandas 数据帧

azure-devops - Azure 数据工厂 CI npm 验证步骤突然崩溃

azure - ADF 翻转窗口触发器更新 startTime 属性行为

clojure - Clojure 现在可以在 Azure 上运行吗?

azure - 如何在 IIS 上托管 BOT Framework V4 BOT