regex - 在映射文件中对多列使用 ReplaceTextWithMapping

标签 regex replace apache-nifi

我需要澄清在我的具体情况下 NiFi 中 ReplaceTextWithMapping 的用法。我的输入文件如下所示:

{"field1" : "A",
"field2" : "A",
"field3": "A"
}

映射文件看起来像这样:

 Header1;Header2;Header3
 A;some text;2

我的预期结果如下:

   {"field1" : "some text",
    "field2": "A",
    "field3": "A2"
    }

正则表达式集简单如下:

[A-Z0-9]+

并且它与映射文件中的字段键匹配(我们期望大写字母或大写字母+数字),但是我不确定您如何决定您想要的值(来自第2列或第3列)将输入值分配给。另外,我的 field2 不应更改,并且需要保留从输入值获取的相同值,而不涉及映射。目前,我得到的是这样的东西:

  {"field1" : "some text A2",
    "field2": "some text A2",
    "field3": "some text A2"
    }

我想我的主要问题是:您能否将输入文件中的相同值映射到来自映射文件不同列的不同值?

谢谢

编辑:我正在使用ReplaceTextWithMapping ,Apache NiFi (v. 0.5.1) 中的开箱即用处理器。在整个数据流中,我最终得到一个 Json 文件,我需要在该文件上应用一些来 self 想要加载到内存中的外部文件的映射(例如,而不是使用 ExtractText 进行解析)。

最佳答案

转发

看来您正在使用 JSON 字符串,通过 JSON 解析引擎使用此类字符串会更容易,因为 JSON 结构允许创建困难的边缘情况,这使得使用正则表达式进行解析变得困难。话虽如此,我相信你有你的理由,而且我不是正则表达式警察。

描述

要进行此类替换,捕获要保留的子字符串和要替换的子字符串会更容易。

(\{"[a-z0-9]+"\s*:\s*")([a-z0-9]+)("[,\r\n]+"[ a-z0-9]+"\s*:\s*")([a-z0-9]+)("[,\r\n]+"[a-z0-9]+"\s* :\s*")([a-z0-9]+)("[,\r\n]+\})

替换为:$1SomeText$3$4$5A2$7

Regular expression visualization

注意:我建议在此表达式中使用以下标志:不区分大小写,点匹配包括换行符在内的所有字符。

示例

实时德诺

此示例显示正则表达式如何与源文本匹配: https://regex101.com/r/vM1qE2/1

源文本

{"field1" : "A",
"field2" : "A",
"field3": "A"
}

更换后

{"field1" : "SomeText",
"field2" : "A",
"field3": "A2"
}

说明

NODE                     EXPLANATION
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    \{                       '{'
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    [a-z0-9]+                any character of: 'a' to 'z', '0' to '9'
                             (1 or more times (matching the most
                             amount possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    :                        ':'
----------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    [a-z0-9]+                any character of: 'a' to 'z', '0' to '9'
                             (1 or more times (matching the most
                             amount possible))
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  (                        group and capture to \3:
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    [,\r\n]+                 any character of: ',', '\r' (carriage
                             return), '\n' (newline) (1 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    [a-z0-9]+                any character of: 'a' to 'z', '0' to '9'
                             (1 or more times (matching the most
                             amount possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    :                        ':'
----------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
  )                        end of \3
----------------------------------------------------------------------
  (                        group and capture to \4:
----------------------------------------------------------------------
    [a-z0-9]+                any character of: 'a' to 'z', '0' to '9'
                             (1 or more times (matching the most
                             amount possible))
----------------------------------------------------------------------
  )                        end of \4
----------------------------------------------------------------------
  (                        group and capture to \5:
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    [,\r\n]+                 any character of: ',', '\r' (carriage
                             return), '\n' (newline) (1 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    [a-z0-9]+                any character of: 'a' to 'z', '0' to '9'
                             (1 or more times (matching the most
                             amount possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    :                        ':'
----------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
  )                        end of \5
----------------------------------------------------------------------
  (                        group and capture to \6:
----------------------------------------------------------------------
    [a-z0-9]+                any character of: 'a' to 'z', '0' to '9'
                             (1 or more times (matching the most
                             amount possible))
----------------------------------------------------------------------
  )                        end of \6
----------------------------------------------------------------------
  (                        group and capture to \7:
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    [,\r\n]+                 any character of: ',', '\r' (carriage
                             return), '\n' (newline) (1 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
    \}                       '}'
----------------------------------------------------------------------
  )                        end of \7

关于regex - 在映射文件中对多列使用 ReplaceTextWithMapping,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37237848/

相关文章:

使用 Notepad++ 查找双括号的正则表达式

c# - 正则表达式包含具有模型状态验证的字符

python - 从字符串 python 中删除\xe2\x80\xa6

iphone - 如何在等待审核期间替换 itunes connect 中的应用程序?

php - 在某个字符出现 n 次后选择的正则表达式

java - Android addToBackStack(null) 不起作用,关闭应用程序

java - 如何管理自定义处理器所需的外部 jar 的依赖关系

apache-nifi - NIFI :Merging Flowfiles by filename in MergeContent processor

apache-nifi - nifi 的多个流

regex - 处理大量用户定义的正则表达式的最佳方法是什么