我是Flume-Ng的新手,需要帮助来尾部文件。我有一个运行hadoop的集群,而其Flume则远程运行。我通过使用腻子与该群集进行通信。我想将文件拖到PC上,然后放在群集的HDFS中。我正在使用以下代码。
#flume.conf: http source, hdfs sink
# Name the components on this agent
tier1.sources = r1
tier1.sinks = k1
tier1.channels = c1
# Describe/configure the source
tier1.sources.r1.type = exec
tier1.sources.r1.command = tail -F /(Path to file on my PC)
# Describe the sink
tier1.sinks.k1.type = hdfs
tier1.sinks.k1.hdfs.path = /user/ntimbadi/flume/
tier1.sinks.k1.hdfs.filePrefix = events-
tier1.sinks.k1.hdfs.round = true
tier1.sinks.k1.hdfs.roundValue = 10
tier1.sinks.k1.hdfs.roundUnit = minute
# Use a channel which buffers events in memory
tier1.channels.c1.type = memory
tier1.channels.c1.capacity = 1000
tier1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
tier1.sources.r1.channels = c1
tier1.sinks.k1.channel = c1
我相信错误出在源头。这种来源不会使用主机名或IP查找(在这种情况下应该是我的PC)。有人可以给我一个提示,如何在PC上拖尾一个文件,以使用Flume将其上传到远程HDFS。
最佳答案
配置中的exec
源将在启动水槽的tier1
代理的计算机上运行。如果要从另一台计算机收集数据,则也需要在该计算机上启动水槽代理。总结一下,您需要:
remote1
源的远程计算机上运行的代理(avro
),该代理将侦听来自收集器代理的事件,并且将充当聚合器。 local1
),您的机器(充当收集器)具有exec
源,并通过avro
接收器将数据发送到远程代理。 或者,您也可以在本地计算机上运行一个Flume代理(具有与您发布的配置相同的配置),并将hdfs路径设置为“hdfs:// REMOTE_IP / hdfs / path”(尽管我不能完全确定这会工作)。
编辑:
以下是2-agent方案的示例配置(如果不进行某些修改,它们可能无法正常工作)。
remote1.channels.mem-ch-1.type = memory
remote1.sources.avro-src-1.channels = mem-ch-1
remote1.sources.avro-src-1.type = avro
remote1.sources.avro-src-1.port = 10060
remote1.sources.avro-src-1.bind = 10.88.66.4 /* REPLACE WITH YOUR MACHINE'S EXTERNAL IP */
remote1.sinks.k1.channel = mem-ch-1
remote1.sinks.k1.type = hdfs
remote1.sinks.k1.hdfs.path = /user/ntimbadi/flume/
remote1.sinks.k1.hdfs.filePrefix = events-
remote1.sinks.k1.hdfs.round = true
remote1.sinks.k1.hdfs.roundValue = 10
remote1.sinks.k1.hdfs.roundUnit = minute
remote1.sources = avro-src-1
remote1.sinks = k1
remote1.channels = mem-ch-1
和
local1.channels.mem-ch-1.type = memory
local1.sources.exc-src-1.channels = mem-ch-1
local1.sources.exc-src-1.type = exec
local1.sources.exc-src-1.command = tail -F /(Path to file on my PC)
local1.sinks.avro-snk-1.channel = mem-ch-1
local1.sinks.avro-snk-1.type = avro
local1.sinks.avro-snk-1.hostname = 10.88.66.4 /* REPLACE WITH REMOTE IP */
local1.sinks.avro-snk-1.port = 10060
local1.sources = exc-src-1
local1.sinks = avro-snk-1
local1.channels = mem-ch-1
关于hadoop - 水槽尾部文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16967822/