apache - 无法将数据从水槽提取到 hdfs hadoop 以获取日志

标签 apache hadoop hdfs flume

我正在使用以下配置将数据从日志文件推送到 hdfs。

agent.channels.memory-channel.type = memory
agent.channels.memory-channel.capacity=5000
agent.sources.tail-source.type = exec
agent.sources.tail-source.command = tail -F /home/training/Downloads/log.txt
agent.sources.tail-source.channels = memory-channel
agent.sinks.log-sink.channel = memory-channel
agent.sinks.log-sink.type = logger
agent.sinks.hdfs-sink.channel = memory-channel
agent.sinks.hdfs-sink.type = hdfs
agent.sinks.hdfs-sink.batchSize=10
agent.sinks.hdfs-sink.hdfs.path = hdfs://localhost:8020/user/flume/data/log.txt
agent.sinks.hdfs-sink.hdfs.fileType = DataStream
agent.sinks.hdfs-sink.hdfs.writeFormat = Text
agent.channels = memory-channel
agent.sources = tail-source
agent.sinks = log-sink hdfs-sink
agent.channels = memory-channel
agent.sources = tail-source
agent.sinks = log-sink hdfs-sink

我没有收到任何错误消息,但我仍然无法在 hdfs 中找到输出。 在中断时,我可以看到接收器中断异常和该日志文件的一些数据。 我正在运行以下命令:

flume-ng agent --conf /etc/flume-ng/conf/ --conf-file /etc/flume-ng/conf/flume.conf -Dflume.root.logger=DEBUG,console -n agent;

最佳答案

我遇到了类似的问题。就我而言,现在它正在工作。下面是配置文件:

#Exec Source
execAgent.sources=e
execAgent.channels=memchannel
execAgent.sinks=HDFS
#channels
execAgent.channels.memchannel.type=file
execAgent.channels.memchannel.capacity = 20000
execAgent.channels.memchannel.transactionCapacity = 1000
#Define Source
execAgent.sources.e.type=org.apache.flume.source.ExecSource
execAgent.sources.e.channels=memchannel
execAgent.sources.e.shell=/bin/bash -c
execAgent.sources.e.fileHeader=false
execAgent.sources.e.fileSuffix=.txt
execAgent.sources.e.command=cat /home/sample.txt
#Define Sink
execAgent.sinks.HDFS.type=hdfs
execAgent.sinks.HDFS.hdfs.path=hdfs://localhost:8020/user/flume/
execAgent.sinks.HDFS.hdfs.fileType=DataStream
execAgent.sinks.HDFS.hdfs.writeFormat=Text
execAgent.sinks.HDFS.hdfs.batchSize=1000
execAgent.sinks.HDFS.hdfs.rollSize=268435
execAgent.sinks.HDFS.hdfs.rollInterval=0
#Bind Source Sink Channel
execAgent.sources.e.channels=memchannel
execAgent.sinks.HDFS.channel=memchannel

关于apache - 无法将数据从水槽提取到 hdfs hadoop 以获取日志,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29456309/

相关文章:

java - 使用 Apache HTTP 服务器前端 Tomcat

hadoop - Hive:如何转换毫秒时间戳?

hadoop - 如何处理Hadoop中的java.net.ConnectException?

hadoop - 使用 Hive 向 HDFS 插入数据

hadoop - Pig,Hive,Hbase,Oozie,Zookeeper在Hadoop 2.0和Hadoop 1.0的安装是一样的吗?

hadoop - Apache Drill读取gz和快速的性能

apache - 我可以使用 HTACCESS 来格式化此 URL 吗?

apache - 使用 mod-rewrite 取消 https 特定 url

ruby-on-rails - ArgumentError : parent directory is world writable, FileUtils#remove_entry_secure 不起作用

hadoop - 如何将 native 写的java代码集成到Serde中进行hive查询