我需要澄清在我的具体情况下 NiFi 中 ReplaceTextWithMapping 的用法。我的输入文件如下所示:
{"field1" : "A",
"field2" : "A",
"field3": "A"
}
映射文件看起来像这样:
Header1;Header2;Header3
A;some text;2
我的预期结果如下:
{"field1" : "some text",
"field2": "A",
"field3": "A2"
}
正则表达式集简单如下:
[A-Z0-9]+
并且它与映射文件中的字段键匹配(我们期望大写字母或大写字母+数字),但是我不确定您如何决定您想要的值(来自第2列或第3列)将输入值分配给。另外,我的 field2 不应更改,并且需要保留从输入值获取的相同值,而不涉及映射。目前,我得到的是这样的东西:
{"field1" : "some text A2",
"field2": "some text A2",
"field3": "some text A2"
}
我想我的主要问题是:您能否将输入文件中的相同值映射到来自映射文件不同列的不同值?
谢谢
编辑:我正在使用ReplaceTextWithMapping ,Apache NiFi (v. 0.5.1) 中的开箱即用处理器。在整个数据流中,我最终得到一个 Json 文件,我需要在该文件上应用一些来 self 想要加载到内存中的外部文件的映射(例如,而不是使用 ExtractText 进行解析)。
最佳答案
转发
看来您正在使用 JSON 字符串,通过 JSON 解析引擎使用此类字符串会更容易,因为 JSON 结构允许创建困难的边缘情况,这使得使用正则表达式进行解析变得困难。话虽如此,我相信你有你的理由,而且我不是正则表达式警察。
描述
要进行此类替换,捕获要保留的子字符串和要替换的子字符串会更容易。
(\{"[a-z0-9]+"\s*:\s*")([a-z0-9]+)("[,\r\n]+"[ a-z0-9]+"\s*:\s*")([a-z0-9]+)("[,\r\n]+"[a-z0-9]+"\s* :\s*")([a-z0-9]+)("[,\r\n]+\})
替换为:$1SomeText$3$4$5A2$7
注意:我建议在此表达式中使用以下标志:不区分大小写,点匹配包括换行符在内的所有字符。
示例
实时德诺
此示例显示正则表达式如何与源文本匹配: https://regex101.com/r/vM1qE2/1
源文本
{"field1" : "A",
"field2" : "A",
"field3": "A"
}
更换后
{"field1" : "SomeText",
"field2" : "A",
"field3": "A2"
}
说明
NODE EXPLANATION
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
\{ '{'
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
[a-z0-9]+ any character of: 'a' to 'z', '0' to '9'
(1 or more times (matching the most
amount possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
----------------------------------------------------------------------
: ':'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
[a-z0-9]+ any character of: 'a' to 'z', '0' to '9'
(1 or more times (matching the most
amount possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
[,\r\n]+ any character of: ',', '\r' (carriage
return), '\n' (newline) (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
[a-z0-9]+ any character of: 'a' to 'z', '0' to '9'
(1 or more times (matching the most
amount possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
----------------------------------------------------------------------
: ':'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
( group and capture to \4:
----------------------------------------------------------------------
[a-z0-9]+ any character of: 'a' to 'z', '0' to '9'
(1 or more times (matching the most
amount possible))
----------------------------------------------------------------------
) end of \4
----------------------------------------------------------------------
( group and capture to \5:
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
[,\r\n]+ any character of: ',', '\r' (carriage
return), '\n' (newline) (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
[a-z0-9]+ any character of: 'a' to 'z', '0' to '9'
(1 or more times (matching the most
amount possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
----------------------------------------------------------------------
: ':'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
) end of \5
----------------------------------------------------------------------
( group and capture to \6:
----------------------------------------------------------------------
[a-z0-9]+ any character of: 'a' to 'z', '0' to '9'
(1 or more times (matching the most
amount possible))
----------------------------------------------------------------------
) end of \6
----------------------------------------------------------------------
( group and capture to \7:
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
[,\r\n]+ any character of: ',', '\r' (carriage
return), '\n' (newline) (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
\} '}'
----------------------------------------------------------------------
) end of \7
关于regex - 在映射文件中对多列使用 ReplaceTextWithMapping,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37237848/