twitter - 多个水槽Twitter代理

标签 twitter hadoop flume

我正在学习hadoop,flume等,而我开始的项目之一是情感分析,虽然还可以,但是现在我正在尝试通过收集多组数据来扩展,这就是我的flume.conf:

    TwitterAgent.sources = Twitter
    TwitterAgent.channels = MemChannel
    TwitterAgent.sinks = HDFS HDFS2
    TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
    TwitterAgent.sources.Twitter.channels = MemChannel
    TwitterAgent.sources.Twitter.consumerKey = xxx
    TwitterAgent.sources.Twitter.consumerSecret = xxxx
    TwitterAgent.sources.Twitter.accessToken = xxx
    TwitterAgent.sources.Twitter.accessTokenSecret = xxxx
    TwitterAgent.sources.Twitter.keywords = bbc
    TwitterAgent.sinks.HDFS.channel = MemChannel
    TwitterAgent.sinks.HDFS.type = hdfs
    TwitterAgent.sinks.HDFS.hdfs.path = hdfs://xxx:8020/user/flume/tweets/
    TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
    TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
    TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
    TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
    TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
    TwitterAgent.channels.MemChannel.type = memory
    TwitterAgent.channels.MemChannel.capacity = 10000
    TwitterAgent.channels.MemChannel.transactionCapacity = 100

我希望实现的目标是将有关bbc的所有推文放在上述位置,而且还使用以下配置将有关利物浦的推文放入单独的文件夹中:
    TwitterAgent.sources.Twitter.keywords = liverpool
    TwitterAgent.sinks.HDFS2.channel = MemChannel
    TwitterAgent.sinks.HDFS2.type = hdfs
    TwitterAgent.sinks.HDFS2.hdfs.path = hdfs://xxx:8020/user/flume/tweets/liverpool/
    TwitterAgent.sinks.HDFS2.hdfs.fileType = DataStream
    TwitterAgent.sinks.HDFS2.hdfs.writeFormat = Text
    TwitterAgent.sinks.HDFS2.hdfs.batchSize = 1000
    TwitterAgent.sinks.HDFS2.hdfs.rollSize = 0
    TwitterAgent.sinks.HDFS2.hdfs.rollCount = 10000
    TwitterAgent.channels.MemChannel2.type = memory
    TwitterAgent.channels.MemChannel2.capacity = 10000
    TwitterAgent.channels.MemChannel2.transactionCapacity = 10

这是行不通的,我无法弄清楚为什么,有人可以指出我正确的方向吗?

最佳答案

这个答案可能有点晚了,但我认为它不起作用,因为您只能使用同一应用程序与Twitter Streaming API建立一个开放连接。

https://dev.twitter.com/discussions/14935

https://dev.twitter.com/discussions/7542

@kurrik Arne Roomann-Kurrik Which streaming endpoint are you using?

For general streams, you should only make one connection from the same IP. For userstreams, one or two connections from the same IP. For site streams, multiple connections are supported (note that site streams is still in limited beta).

关于twitter - 多个水槽Twitter代理,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21333532/

相关文章:

ruby-on-rails - rails : Cookie overflow with omniauth twitter sign up

hadoop - 启动start-all.sh(HADOOP)时出错

flume - Apache Flume与Apache Flink的区别

hadoop - Flume脚本给出警告:未设置配置目录!使用--conf <dir>覆盖

用于 Flume 接收器文件的 Hadoop Streaming MapReduce - FileNotFoundException

iphone - twitter ios5 集成使用自定义登录名和密码进行自动化

javascript - 在 JavaScript 中从一个函数访问另一个函数的变量

eclipse - 在 Eclipse 中调试 Hive?

java - PIG : Cannot cast java. lang.String to org.apache.avro.util.Utf8 with AvroStorage inside STORE

Android,如何过滤只有 Facebook 和 Twitter 的社交分享?