hadoop - 使用内存 channel 增加水槽中的文件大小

标签 hadoop hdfs bigdata flume flume-ng

下面是我的水槽配置文件。即使在更改 rollInterval 和 rollSize 之后,也只有 10 个事件被写入,控制台也显示 rollCount=10 和 events=10。我还尝试将 rollCount 增加到 1000,但输出没有变化。谁能建议增加用 hdfs 写入的文件大小。下面的 conf 文件有什么问题?

#naming components 

NetAgent.sources = NetCat_1 NetCat_2
NetAgent.sinks = HDFS
NetAgent.channels = MemChannel


NetAgent.sources.NetCat_1.type = netcat
NetAgent.sources.NetCat_1.bind = localhost
NetAgent.sources.NetCat_1.port = 8671

NetAgent.sources.NetCat_2.type = netcat
NetAgent.sources.NetCat_2.bind = localhost
NetAgent.sources.NetCat_2.port = 8672


NetAgent.sinks.HDFS.type = hdfs
NetAgent.sinks.HDFS.hdfs.path = file path here
NetAgent.sinks.HDFS.hdfs.filePrefix = test
NetAgent.sinks.HDFS.hdfs.rollSize = 67108864
NetAgent.sinks.HDFS.hdfs.rollInterval = 3600
NetAgent.sinks.HDFS.rollCount = 0
NetAgent.sinks.HDFS.hdfs.batchSize = 10000
NetAgent.sinks.HDFS.hdfs.writeFormat = Text
NetAgent.sinks.HDFS.hdfs.fileType = DataStream


NetAgent.channels.MemChannel.type = memory
NetAgent.channels.MemChannel.capacity = 20000
NetAgent.channels.MemChannel.transactionCapacity = 20000


NetAgent.sources.NetCat_1.channels = MemChannel
NetAgent.sources.NetCat_2.channels = MemChannel
NetAgent.sinks.HDFS.channel = MemChannel

控制台记录为

(SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUg-org.apache.flume.sink.hdfs.BucketWriter.shouldRotate(BucketWriter.java)]
rolling: rollCount: 10, events: 10

the image shows the files written in HDFS

最佳答案

您忘记将 hdfs 添加到您的 rollCount 配置中。它使用默认值 10,因为它看不到您的配置。请注意,您的 HDFS 配置是:

NetAgent.sinks.HDFS.type = hdfs
NetAgent.sinks.HDFS.hdfs.rollSize = 67108864
NetAgent.sinks.HDFS.hdfs.rollInterval = 3600
NetAgent.sinks.HDFS.rollCount = 0
NetAgent.sinks.HDFS.hdfs.batchSize = 10000
NetAgent.sinks.HDFS.hdfs.writeFormat = Text
NetAgent.sinks.HDFS.hdfs.fileType = DataStream

在 rollCount 行中,它需要是:

NetAgent.sinks.HDFS.hdfs.rollCount = 0

这将覆盖默认的 rollCount 并且您的 Flume 代理将按您希望的方式运行。

关于hadoop - 使用内存 channel 增加水槽中的文件大小,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34459610/

相关文章:

java - 在Eclipse中读取HDFS文件

shell - get "ERROR: Can' t 从 ZooKeeper 获取主地址; znode data == null"使用 Hbase shell 时

amazon-web-services - Amazon EMR 上的 s3fs : Will it scale for approx 100million small files?

python - python pydoop程序中HADOOP_CONF_DIR not found错误

hadoop - 试图执行增量导入sqoop作业,但是存在以下错误

jvm - 由于内存不足,Spark Join 失败

python - Hadoop:在迭代映射作业之间维护内存缓存

mysql - Cassandra 或 Hadoop Hive 或 MYSQL?

python - 运行PySpark命令时出错

bigdata - 如何在 Druid SQL 中进行分页