我正在尝试在 ADF 中进行一些源转换,其中有很多采用以下格式的服务器日志:
#PartnerName QA Server
#ApplicationName T_GSPClient
#AccountName DoNotModifyDMS
#SDK desktop
#ClientVersion 5.1.1894.3
#InputChannel DesktopMic
#User JohnDoe
#NmsLogin JohnDoe
#SessionId 7ba732d6-3445-4b16-b7e8-345fgd4f5g4
#ClientIP 209.122.69.109
#SRTechnology S2
#SROptions NoTextBefore
#GeneralLogLevel Trace
#ModuleLogLevels
#ServerDateTimeUTC 2023-07-06 15:28:33.105
{"date":"2023-07-06\t15:28:09.653","level":"TRACE","msg":"DMVAServerMessage-Initialize:{"dmhMessage":{"messageHeader":{"messageType":"unsubscribe","sessionId":"00000000-0000-0000-0000-000000000000","messageId":"aeb65e55-a7c0-4e96-a960-dc2a252d6b2c","transactionType":"acknowledgement","clientType":"ttsChannel","version":"1.0","application":"DesktopSDK"},"messageResponse":{"resultCode":"SERVER_ERROR","errorMessage":"The subscription id does not exist, unsubscribe did not remove an entry"}}} ","traceId":"409e0d44-ad50-4f17-84c7-0521e01e11fc","spanId":"e0750e44-ad50-4f17-90a9-3b6940e0294b","resource":{"module":".NET","class":"Nuance.SpeechAnywhere.Internal.DMVA.DMVAServerMessage","function":"Initialize","line":70,"pid":26612,"thread":"[28-27280]"}}
{"date":"2023-07-06\t15:28:09.653","level":"TRACE","msg":"DMVAServerMessage-Initialize:{"dmhMessage":{"messageHeader":{"messageType":"unsubscribe","sessionId":"00000000-0000-0000-0000-000000000000","messageId":"aeb65e55-a7c0-4e96-a960-dc2a252d6b2c","transactionType":"acknowledgement","clientType":"ttsChannel","version":"1.0","application":"DesktopSDK"},"messageResponse":{"resultCode":"SERVER_ERROR","errorMessage":"The subscription id does not exist, unsubscribe did not remove an entry"}}} ","traceId":"409e0d44-ad50-4f17-84c7-0521e01e11fc","spanId":"e0750e44-ad50-4f17-90a9-3b6940e0294b","resource":{"module":".NET","class":"Nuance.SpeechAnywhere.Internal.DMVA.DMVAServerMessage","function":"Initialize","line":70,"pid":26612,"thread":"[28-27280]"}}
{"date":"2023-07-06\t15:28:09.653","level":"TRACE","msg":"DMVAServerMessage-Initialize:{"dmhMessage":{"messageHeader":{"messageType":"unsubscribe","sessionId":"00000000-0000-0000-0000-000000000000","messageId":"aeb65e55-a7c0-4e96-a960-dc2a252d6b2c","transactionType":"acknowledgement","clientType":"ttsChannel","version":"1.0","application":"DesktopSDK"},"messageResponse":{"resultCode":"SERVER_ERROR","errorMessage":"The subscription id does not exist, unsubscribe did not remove an entry"}}} ","traceId":"409e0d44-ad50-4f17-84c7-0521e01e11fc","spanId":"e0750e44-ad50-4f17-90a9-3b6940e0294b","resource":{"module":".NET","class":"Nuance.SpeechAnywhere.Internal.DMVA.DMVAServerMessage","function":"Initialize","line":70,"pid":26612,"thread":"[28-27280]"}}
{"date":"2023-07-06\t15:28:09.653","level":"TRACE","msg":"DMVAServerMessage-Initialize:{"dmhMessage":{"messageHeader":{"messageType":"unsubscribe","sessionId":"00000000-0000-0000-0000-000000000000","messageId":"aeb65e55-a7c0-4e96-a960-dc2a252d6b2c","transactionType":"acknowledgement","clientType":"ttsChannel","version":"1.0","application":"DesktopSDK"},"messageResponse":{"resultCode":"SERVER_ERROR","errorMessage":"The subscription id does not exist, unsubscribe did not remove an entry"}}} ","traceId":"409e0d44-ad50-4f17-84c7-0521e01e11fc","spanId":"e0750e44-ad50-4f17-90a9-3b6940e0294b","resource":{"module":".NET","class":"Nuance.SpeechAnywhere.Internal.DMVA.DMVAServerMessage","function":"Initialize","line":70,"pid":26612,"thread":"[28-27280]"}}
我的目标是删除/跳过第一个自由文本行并保留其他 json 数据,然后将其移动到另一个 blob 进行进一步转换。我尝试使用派生列进行源转换,但我的数据流仍然显示 json 格式错误。我还通过 ForEach 使用了“复制数据”事件和“enableSkipInknownRow”:true,但它不起作用。它仅在我只使用单个文件时才有效,而当我尝试迭代多个文件并跳过/删除这些行时则无效。
最佳答案
为了使用 ADF 数据流删除前几行非 Json 格式的行,请按照以下方法操作。
- 使用分隔的源数据集进行源转换。取不属于文本一部分的任何列分隔符。
- 在这里,我使用波形符
~
作为列分隔符。因此,整行位于同一列中。
- 然后进行过滤器转换,并根据条件指定过滤器,
substring(Column_name,1,1)=='{'
。这将删除所有不以{
开头的行。
- 然后进行接收器转换并将文件名选项设置为输出到单个文件并给出文件名。在优化中,选择单个分区。
在这里,我将接收器文件作为分隔数据集,其中列分隔符为无分隔符
,引号字符为无引号字符
。
输出文件中的数据
关于azure - 从 Azure 数据工厂中的 Json 文件中提取/删除/跳过自由文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76660184/