我正在学习hadoop,flume等,而我开始的项目之一是情感分析,虽然还可以,但是现在我正在尝试通过收集多组数据来扩展,这就是我的flume.conf:
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS HDFS2
TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = xxx
TwitterAgent.sources.Twitter.consumerSecret = xxxx
TwitterAgent.sources.Twitter.accessToken = xxx
TwitterAgent.sources.Twitter.accessTokenSecret = xxxx
TwitterAgent.sources.Twitter.keywords = bbc
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://xxx:8020/user/flume/tweets/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100
我希望实现的目标是将有关bbc的所有推文放在上述位置,而且还使用以下配置将有关利物浦的推文放入单独的文件夹中:
TwitterAgent.sources.Twitter.keywords = liverpool
TwitterAgent.sinks.HDFS2.channel = MemChannel
TwitterAgent.sinks.HDFS2.type = hdfs
TwitterAgent.sinks.HDFS2.hdfs.path = hdfs://xxx:8020/user/flume/tweets/liverpool/
TwitterAgent.sinks.HDFS2.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS2.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS2.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS2.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS2.hdfs.rollCount = 10000
TwitterAgent.channels.MemChannel2.type = memory
TwitterAgent.channels.MemChannel2.capacity = 10000
TwitterAgent.channels.MemChannel2.transactionCapacity = 10
这是行不通的,我无法弄清楚为什么,有人可以指出我正确的方向吗?
最佳答案
这个答案可能有点晚了,但我认为它不起作用,因为您只能使用同一应用程序与Twitter Streaming API建立一个开放连接。
https://dev.twitter.com/discussions/14935
https://dev.twitter.com/discussions/7542
@kurrik Arne Roomann-Kurrik Which streaming endpoint are you using?
For general streams, you should only make one connection from the same IP. For userstreams, one or two connections from the same IP. For site streams, multiple connections are supported (note that site streams is still in limited beta).
关于twitter - 多个水槽Twitter代理,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21333532/