我需要澄清在我的具体情况下 NiFi 中 ReplaceTextWithMapping 的用法。我的输入文件如下所示:

{"field1" : "A",
"field2" : "A",
"field3": "A"
}

映射文件看起来像这样:

 Header1;Header2;Header3
 A;some text;2

我的预期结果如下:

   {"field1" : "some text",
    "field2": "A",
    "field3": "A2"
    }

正则表达式集简单如下:

[A-Z0-9]+

并且它与映射文件中的字段键匹配(我们期望大写字母或大写字母+数字)，但是我不确定您如何决定您想要的值(来自第2列或第3列)将输入值分配给。另外，我的 field2 不应更改，并且需要保留从输入值获取的相同值，而不涉及映射。目前，我得到的是这样的东西:

  {"field1" : "some text A2",
    "field2": "some text A2",
    "field3": "some text A2"
    }

我想我的主要问题是:您能否将输入文件中的相同值映射到来自映射文件不同列的不同值？

谢谢

编辑:我正在使用ReplaceTextWithMapping ，Apache NiFi (v. 0.5.1) 中的开箱即用处理器。在整个数据流中，我最终得到一个 Json 文件，我需要在该文件上应用一些来 self 想要加载到内存中的外部文件的映射(例如，而不是使用 ExtractText 进行解析)。

最佳答案

看来您正在使用 JSON 字符串，通过 JSON 解析引擎使用此类字符串会更容易，因为 JSON 结构允许创建困难的边缘情况，这使得使用正则表达式进行解析变得困难。话虽如此，我相信你有你的理由，而且我不是正则表达式警察。

描述

要进行此类替换，捕获要保留的子字符串和要替换的子字符串会更容易。

(\{"[a-z0-9]+"\s*:\s*")([a-z0-9]+)("[,\r\n]+"[ a-z0-9]+"\s*:\s*")([a-z0-9]+)("[,\r\n]+"[a-z0-9]+"\s* :\s*")([a-z0-9]+)("[,\r\n]+\})

替换为:$1SomeText$3$4$5A2$7

Regular expression visualization

注意:我建议在此表达式中使用以下标志:不区分大小写，点匹配包括换行符在内的所有字符。

示例

实时德诺

此示例显示正则表达式如何与源文本匹配: https://regex101.com/r/vM1qE2/1

源文本

{"field1" : "A",
"field2" : "A",
"field3": "A"
}

更换后

{"field1" : "SomeText",
"field2" : "A",
"field3": "A2"
}

说明

NODE                     EXPLANATION
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    \{                       '{'
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    [a-z0-9]+                any character of: 'a' to 'z', '0' to '9'
                             (1 or more times (matching the most
                             amount possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    :                        ':'
----------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    [a-z0-9]+                any character of: 'a' to 'z', '0' to '9'
                             (1 or more times (matching the most
                             amount possible))
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  (                        group and capture to \3:
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    [,\r\n]+                 any character of: ',', '\r' (carriage
                             return), '\n' (newline) (1 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    [a-z0-9]+                any character of: 'a' to 'z', '0' to '9'
                             (1 or more times (matching the most
                             amount possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    :                        ':'
----------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
  )                        end of \3
----------------------------------------------------------------------
  (                        group and capture to \4:
----------------------------------------------------------------------
    [a-z0-9]+                any character of: 'a' to 'z', '0' to '9'
                             (1 or more times (matching the most
                             amount possible))
----------------------------------------------------------------------
  )                        end of \4
----------------------------------------------------------------------
  (                        group and capture to \5:
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    [,\r\n]+                 any character of: ',', '\r' (carriage
                             return), '\n' (newline) (1 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    [a-z0-9]+                any character of: 'a' to 'z', '0' to '9'
                             (1 or more times (matching the most
                             amount possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    :                        ':'
----------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
  )                        end of \5
----------------------------------------------------------------------
  (                        group and capture to \6:
----------------------------------------------------------------------
    [a-z0-9]+                any character of: 'a' to 'z', '0' to '9'
                             (1 or more times (matching the most
                             amount possible))
----------------------------------------------------------------------
  )                        end of \6
----------------------------------------------------------------------
  (                        group and capture to \7:
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    [,\r\n]+                 any character of: ',', '\r' (carriage
                             return), '\n' (newline) (1 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
    \}                       '}'
----------------------------------------------------------------------
  )                        end of \7

关于regex - 在映射文件中对多列使用 ReplaceTextWithMapping，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37237848/

regex - 在映射文件中对多列使用 ReplaceTextWithMapping

转发

描述

示例

说明

上一篇：php - fwrite - 创建文本文件，但将扩展名设置为 'unl'

下一篇：jenkins - 无法恢复 nuget 包并出现 "WARNING: Invalid parameter"错误