hadoop flume 提取推特数据

标签 hadoop twitter4j flume

我在使用 flume 提取 twitter 数据时使用了这个命令

[cloudera@localhost bin]$ ./flume-ng agent --conf ./conf/-f ../conf/flume.conf -Dflume.root.logger=DEBUG,console -n TwitterAgent

2015-07-14 05:42:00 [INFO ] Configuration provider starting
2015-07-14 05:42:00 [INFO ] Reloading configuration file:../conf/flume.conf
2015-07-14 05:42:00 [INFO ] Processing:HDFS
2015-07-14 05:42:00 [INFO ] Processing:HDFS
2015-07-14 05:42:00 [INFO ] Processing:HDFS
2015-07-14 05:42:00 [INFO ] Added sinks: HDFS Agent: TwitterAgent
2015-07-14 05:42:00 [INFO ] Processing:HDFS
2015-07-14 05:42:00 [INFO ] Processing:HDFS
2015-07-14 05:42:00 [INFO ] Processing:HDFS
2015-07-14 05:42:00 [INFO ] Processing:HDFS
2015-07-14 05:42:00 [INFO ] Processing:HDFS
2015-07-14 05:42:00 [INFO ] Post-validation flume configuration contains configuration for agents: [TwitterAgent]
2015-07-14 05:42:00 [INFO ] Creating channels
2015-07-14 05:42:00 [INFO ] Creating instance of channel MemChannel type memory
2015-07-14 05:42:00 [INFO ] Created channel MemChannel
2015-07-14 05:42:00 [INFO ] Creating instance of source Twitter, type org.apache.flume.source.twitter.TwitterSource

它正在处理到这里然后是错误:

2015-07-14 05:42:01 [ERROR] Unhand-led error java.lang.NoSuchMethodError: twitter4j.TwitterStream.addListener(Ltwitter4j/StatusListener;)V at org.apache.flume.source.twitter.TwitterSource.configure(TwitterSource.java:119) at org.apache.flume.conf.Configurables.configure(Configurables.java:41) at org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:331) at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:102) at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPool Executor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2015-07-14 05:44:13 [INFO ] Stopping life cycle supervisor 10 2015-07-14 05:44:13 [INFO ] Configuration provider stopping

最佳答案

java.lang.NoSuchMethodError <- 这表示您使用的 twitter4j 版本错误。

当前 1.6.0 版本的 flume 的 twitter-source 是针对 3.0.3 构建的:

[INFO] +- org.twitter4j:twitter4j-core:jar:3.0.3:compile
[INFO] +- org.twitter4j:twitter4j-media-support:jar:3.0.3:compile
[INFO] \- org.twitter4j:twitter4j-stream:jar:3.0.3:compile

只需用那些替换你的 twitter4j 库,然后它就会工作。

关于hadoop flume 提取推特数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31407500/

相关文章:

hadoop - Ambari,数据节点上的连接失败

java - 尝试使用nutch进行爬网时出错-自己的本地主机名上的java.net.UnknownHostException

hadoop jar 命令指向本地文件系统

java - 根据位置和跟踪关键字过滤推文

windows - 在 windows 10 上安装 flume agent

hadoop - 配置 Flume 时出现 MissingArgumentException

hadoop - 如何从配置单元外部表创建数据框

java - Twitter4j on java - 查询搜索只给我 6 个最新结果

java - Twitter4J get Direct Messages() 不再起作用了吗?

hadoop - 如何有效地将数据从 Kafka 移动到 Impala 表?