我有一组 4 列的 csv 数据,其中 5 行的第一列的记录具有相同的值。然后,接下来的 5 行的第一列的值再次保持相同,依此类推。
示例数据:
a,21,51,xxx
a,22,52,xxx
a,23,53,xxx
a,24,54,xxx
a,25,55,xxx
b,21,61,yyy
b,22,62,yyy
b,23,63,yyy
b,24,64,yyy
b,25,65,yyy
...........
但有时记录是任意输入的:
a,21,51,xxx
a,22,52,xxx
a,23,53,xxx
b,21,61,yyy
b,22,62,yyy
a,24,54,xxx
a,25,55,xxx
b,23,63,yyy
b,24,64,yyy
b,25,65,yyy
...........
有没有办法使用 NiFi 处理器根据第一列对此类数据进行分组?
任何答案都会有所帮助。
谢谢
最佳答案
您应该能够使用分组正则表达式对 RouteText 处理器执行此操作,它表示:
"Specifies a Regular Expression to evaluate against each line to determine which Group the line should be placed in. The Regular Expression must have at least one Capturing Group that defines the line's Group. If multiple Capturing Groups exist in the Regular Expression, the Group from all Capturing Groups. Two lines will not be placed into the same FlowFile unless the they both have the same value for the Group (or neither line matches the Regular Expression). For example, to group together all lines in a CSV File by the first column, we can set this value to "(.?),.". Two lines that have the same Group but different Relationships will never be placed into the same FlowFile."
我认为您可以将它与匹配正则表达式的匹配策略结合使用,并且只需为该表达式使用 .* 以便每一行都匹配。
然后对于分组表达式,使用上面的示例按第一列分组(。?)。
关于csv - 使用 NiFi 处理器对 csv 数据进行分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42855149/