csv - 使用 NiFi 处理器对 csv 数据进行分组

我有一组 4 列的 csv 数据，其中 5 行的第一列的记录具有相同的值。然后，接下来的 5 行的第一列的值再次保持相同，依此类推。

示例数据:

a,21,51,xxx
a,22,52,xxx
a,23,53,xxx
a,24,54,xxx
a,25,55,xxx
b,21,61,yyy
b,22,62,yyy
b,23,63,yyy
b,24,64,yyy
b,25,65,yyy
...........

但有时记录是任意输入的:

a,21,51,xxx
a,22,52,xxx
a,23,53,xxx
b,21,61,yyy
b,22,62,yyy
a,24,54,xxx
a,25,55,xxx
b,23,63,yyy
b,24,64,yyy
b,25,65,yyy
...........

有没有办法使用 NiFi 处理器根据第一列对此类数据进行分组？

任何答案都会有所帮助。

谢谢

最佳答案

您应该能够使用分组正则表达式对 RouteText 处理器执行此操作，它表示:

"Specifies a Regular Expression to evaluate against each line to determine which Group the line should be placed in. The Regular Expression must have at least one Capturing Group that defines the line's Group. If multiple Capturing Groups exist in the Regular Expression, the Group from all Capturing Groups. Two lines will not be placed into the same FlowFile unless the they both have the same value for the Group (or neither line matches the Regular Expression). For example, to group together all lines in a CSV File by the first column, we can set this value to "(.?),.". Two lines that have the same Group but different Relationships will never be placed into the same FlowFile."

我认为您可以将它与匹配正则表达式的匹配策略结合使用，并且只需为该表达式使用 .* 以便每一行都匹配。

然后对于分组表达式，使用上面的示例按第一列分组(。？)。

关于csv - 使用 NiFi 处理器对 csv 数据进行分组，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42855149/

上一篇：msbuild - TeamCity - 10.0.5 没有兼容的构建代理

下一篇：python - 如何高效查找源代码文件中的小错别字？

相关文章：

linux - 如果无法通过 linux 获取数据，我如何不发送文件

php - 如何在服务器上用 PHP 从 MySQL 查询中保存 CSV 文件

ruby - 如何强制 Ruby 的 CSV 输出中的一个字段用双引号引起来？

python - 我们如何将 csv/xls 文件中两列的数据读入 2 个变量并将它们用于使用 python 的程序

hortonworks-data-platform - UI 未在 nifi 1.0.0 安全集群中打开

apache-nifi - Apache Nifi 在 DocGenerator.generate() 期间无法启动 Web 服务器(关闭)

etl - 如何使用 Apache Nifi 加入两个 CSV

python - saspy:将大型 SAS 表写入本地 csv

apache-nifi - 自定义处理器的单独日志文件

Java PUT 请求 NiFi