hadoop - 使用Flume在HDFS中获取Twitter数据时出现问题

标签 hadoop twitter hdfs flume flume-twitter

我正在尝试获取HDFS中的twitter数据,但出现问题。

这是我的 flume.conf文件

TwitterAgent.sources= Twitter
TwitterAgent.channels= MemChannel
TwitterAgent.sinks=HDFS
TwitterAgent.sources.TwitterSource.type=org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.channels=MemChannel
TwitterAgent.sources.Twitter.consumerKey=xxxxxxxxxxx
TwitterAgent.sources.Twitter.consumerSecret=    xxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessToken=xxxxxxxxxx
TwitterAgent.sources.Twitter.accessTokenSecret=xxxxxxxxxxx
TwitterAgent.sources.Twitter.keywords= hadoop,election,sports, cricket,Big data
TwitterAgent.sinks.HDFS.channel=MemChannel
TwitterAgent.sinks.HDFS.type=hdfs
TwitterAgent.sinks.HDFS.hdfs.path=hdfs://localhost:9000/user/flume/tweets
TwitterAgent.sinks.HDFS.hdfs.fileType=DataStream
TwitterAgent.sinks.HDFS.hdfs.writeformat=Text
TwitterAgent.sinks.HDFS.hdfs.batchSize=1000
TwitterAgent.sinks.HDFS.hdfs.rollSize=0
TwitterAgent.sinks.HDFS.hdfs.rollCount=10000
TwitterAgent.sinks.HDFS.hdfs.rollInterval=600
TwitterAgent.channels.MemChannel.type=memory
TwitterAgent.channels.MemChannel.capacity=10000
TwitterAgent.channels.MemChannel.transactionCapacity=100

Env.sh 文件中,我具有路径
 #FLUME_CLASSPATH="/usr/lib/flume-sources-1.0-SNAPSHOT.jar"

现在我正在使用以下命令来获取数据
[cloudera@quickstart etc]$ flume-ng agent -n TwitterAgent -c conf -f /etc/flume-ng/conf/flume.conf

它显示一些日志,但是出现以下错误,并且在HDFS接收器启动后卡住了。
16/09/25 05:18:36 WARN conf.FlumeConfiguration: Could not configure source  Twitter due to: Component has no type. Cannot configure. Twitter
org.apache.flume.conf.ConfigurationException: Component has no type. Cannot configure. Twitter
    at org.apache.flume.conf.ComponentConfiguration.configure(ComponentConfiguration.java:76)
    at org.apache.flume.conf.source.SourceConfiguration.configure(SourceConfiguration.java:56)
    at org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateSources(FlumeConfiguration.java:567)
    at org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:346)
    at org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.access$000(FlumeConfiguration.java:213)
    at org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:127)
    at org.apache.flume.conf.FlumeConfiguration.<init>(FlumeConfiguration.java:109)
    at org.apache.flume.node.PropertiesFileConfigurationProvider.getFlumeConfiguration(PropertiesFileConfigurationProvider.java:189)
    at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:89)
    at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
16/09/25 05:18:36 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [TwitterAgent]
16/09/25 05:18:36 INFO node.AbstractConfigurationProvider: Creating channels
16/09/25 05:18:36 INFO channel.DefaultChannelFactory: Creating instance of channel MemChannel type memory
16/09/25 05:18:36 INFO node.AbstractConfigurationProvider: Created channel MemChannel
16/09/25 05:18:36 INFO sink.DefaultSinkFactory: Creating instance of sink: HDFS, type: hdfs
16/09/25 05:18:36 INFO node.AbstractConfigurationProvider: Channel MemChannel connected to [HDFS]
16/09/25 05:18:36 INFO node.Application: Starting new configuration:{ sourceRunners:{} sinkRunners:{HDFS=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@3963542c counterGroup:{ name:null counters:{} } }} channels:{MemChannel=org.apache.flume.channel.MemoryChannel{name: MemChannel}} }
16/09/25 05:18:36 INFO node.Application: Starting Channel MemChannel
16/09/25 05:18:36 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: MemChannel: Successfully registered new MBean.
16/09/25 05:18:36 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: MemChannel started
16/09/25 05:18:36 INFO node.Application: Starting Sink HDFS
16/09/25 05:18:36 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: HDFS: Successfully registered new MBean.
16/09/25 05:18:36 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: HDFS started

最佳答案

在配置文件中请替换

TwitterAgent.sources.TwitterSource.type=org.apache.flume.source.twitter.TwitterSource

通过
TwitterAgent.sources.Twitter.type=org.apache.flume.source.twitter.TwitterSource

关于hadoop - 使用Flume在HDFS中获取Twitter数据时出现问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39686821/

相关文章:

python - 使用 Tweepy 从 Twitter 获取特定位置的推文

hadoop - 将许多小文件传输到 Hadoop 文件系统

shell - 如何使用Pig/Hive从Weblog文件中的URL中提取字符串

objective-c - 通过iPhone应用发送推文

hadoop - 将 Hadoop 作业输出重定向到文件

web-applications - 情感分析使推文与搜索查询匹配并进行分析

hadoop - hadoop 中的磁盘使用 (du) 最大深度等效选项

hadoop - Spark Avro 到 Parquet Writer

hadoop - 在IDE Eclipse中访问hbase,java.net.UnknownHostException

mysql - RDBMS MySQL 中的 sqoop 导出更新表记录