hadoop - 水槽尾部文件

我是Flume-Ng的新手，需要帮助来尾部文件。我有一个运行hadoop的集群，而其Flume则远程运行。我通过使用腻子与该群集进行通信。我想将文件拖到PC上，然后放在群集的HDFS中。我正在使用以下代码。

#flume.conf: http source, hdfs sink
# Name the components on this agent 

tier1.sources = r1
tier1.sinks = k1
tier1.channels = c1


# Describe/configure the source
tier1.sources.r1.type = exec
tier1.sources.r1.command = tail -F /(Path to file on my PC)


# Describe the sink
tier1.sinks.k1.type = hdfs
tier1.sinks.k1.hdfs.path = /user/ntimbadi/flume/
tier1.sinks.k1.hdfs.filePrefix = events-
tier1.sinks.k1.hdfs.round = true
tier1.sinks.k1.hdfs.roundValue = 10
tier1.sinks.k1.hdfs.roundUnit = minute



 # Use a channel which buffers events in memory
 tier1.channels.c1.type = memory
 tier1.channels.c1.capacity = 1000
 tier1.channels.c1.transactionCapacity = 100


 # Bind the source and sink to the channel
 tier1.sources.r1.channels = c1
 tier1.sinks.k1.channel = c1

我相信错误出在源头。这种来源不会使用主机名或IP查找(在这种情况下应该是我的PC)。有人可以给我一个提示，如何在PC上拖尾一个文件，以使用Flume将其上传到远程HDFS。

最佳答案

配置中的exec源将在启动水槽的tier1代理的计算机上运行。如果要从另一台计算机收集数据，则也需要在该计算机上启动水槽代理。总结一下，您需要:

是在具有remote1源的远程计算机上运行的代理(avro)，该代理将侦听来自收集器代理的事件，并且将充当聚合器。

一个在上运行的代理程序(local1)，您的机器(充当收集器)具有exec源，并通过avro接收器将数据发送到远程代理。

或者，您也可以在本地计算机上运行一个Flume代理(具有与您发布的配置相同的配置)，并将hdfs路径设置为“hdfs:// REMOTE_IP / hdfs / path”(尽管我不能完全确定这会工作)。

编辑:
以下是2-agent方案的示例配置(如果不进行某些修改，它们可能无法正常工作)。

remote1.channels.mem-ch-1.type = memory

remote1.sources.avro-src-1.channels = mem-ch-1
remote1.sources.avro-src-1.type = avro
remote1.sources.avro-src-1.port = 10060
remote1.sources.avro-src-1.bind = 10.88.66.4 /* REPLACE WITH YOUR MACHINE'S EXTERNAL IP */

remote1.sinks.k1.channel = mem-ch-1
remote1.sinks.k1.type = hdfs
remote1.sinks.k1.hdfs.path = /user/ntimbadi/flume/
remote1.sinks.k1.hdfs.filePrefix = events-
remote1.sinks.k1.hdfs.round = true
remote1.sinks.k1.hdfs.roundValue = 10
remote1.sinks.k1.hdfs.roundUnit = minute

remote1.sources = avro-src-1
remote1.sinks = k1
remote1.channels = mem-ch-1

和

local1.channels.mem-ch-1.type = memory

local1.sources.exc-src-1.channels = mem-ch-1
local1.sources.exc-src-1.type = exec
local1.sources.exc-src-1.command = tail -F /(Path to file on my PC)

local1.sinks.avro-snk-1.channel = mem-ch-1
local1.sinks.avro-snk-1.type = avro
local1.sinks.avro-snk-1.hostname = 10.88.66.4 /* REPLACE WITH REMOTE IP */
local1.sinks.avro-snk-1.port = 10060

local1.sources = exc-src-1
local1.sinks = avro-snk-1
local1.channels = mem-ch-1

关于hadoop - 水槽尾部文件，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/16967822/

hadoop - 水槽尾部文件

上一篇：hadoop - 如何确定Hadoop中正确的映射器数量？

下一篇：scala - Scala 中的 MapReduce 上下文输出