python - 使用 Rest API 和 Tweepy 从推文下载完整的 JSON 数据，通过推文 ID 查询

总体而言，对于使用 tweepy 和 Twitter 的 API 来说是全新的，并且我意识到(为时已晚)我在收集一些 Twitter 数据时犯了一些错误。我一直在收集有关冬季奥运会的推文，并使用 Streaming API 按搜索词进行过滤。但是，我没有检索所有可用数据，而是仅检索了文本、日期时间和推文 ID。实现的流监听器的示例如下:

import os
import sys
import tweepy

os.chdir('/my/preferred/location/Twitter Olympics/Data')

consumer_key = 'cons_key'
consumer_secret = 'cons_sec'
access_token = 'access_token'
access_secret = 'access_sec'

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)

# count is used to give an approximation of how many tweets I'm pulling at a given time.

count = []
f = open('feb24.txt', 'a')

class StreamListener(tweepy.StreamListener):
    def on_status(self, status):
        print 'Running...'
        info = status.text, status.created_at, status.id
        f.write(str(info))
        for i in info:
          count.append(1)

    def on_error(self, status_code):
        print >> sys.stderr, "Encountered error with status code: ", status_code

    def on_timeout(self):
        print >> sys.stderr, "Timeout..."
        return True

sapi = tweepy.streaming.Stream(auth, StreamListener())
sapi.filter(track=["olympics", "olympics 2014", "sochi", "Sochi2014", "sochi 2014",      "2014Sochi", "winter olympics"])

存储在 .txt 中的输出示例文件在这里: ('RT @Visa: There can only be one winner. Soak it in #TeamUSA, this is your #everywhere #Sochi2014 <a href="http://t.co/dVKYUln1r7" rel="noreferrer noopener nofollow">http://t.co/dVKYUln1r7</a>', datetime.datetime(2014, 2, 15, 18, 9, 51), 111111111111111111) .

所以，这是我的问题。如果我能够在列表中获取推文 ID，是否有办法迭代这些 ID 以查询 Twitter Rest API 并检索完整的 JSON 文件？我的预感是肯定的，但我不确定实现，主要是如何将结果数据保存为 JSON 文件(因为我在这里一直使用 .txt 文件)。预先感谢您的阅读。

最佳答案

想通了。对于犯过这个可怕错误的人(只需获取所有数据即可!)，这里有一些带有正则表达式的代码，可以提取 ID 号并将其存储为列表:

import re

# Read in your ugly text file.
tweet_string = open('nameoffile.txt', 'rU')
tweet_string = tweet_string.read()

# Find all the id numbers with a regex.
id_finder = re.compile('[0-9]{18,18}')

# Go through the twee_string object and find all 
# the IDs that meet the regex criteria.
idList = re.findall(id_finder, tweet_string)

现在您可以迭代列表 idList 并将每个 ID 作为对 api 的查询提供(假设您已完成身份验证并拥有 api 类的实例)。然后您可以将它们附加到列表中。像这样的东西:

tweet_list = []
for id in idList:
    tweet = api.get_status(id)
    tweet_list.append(tweet)

重要提示:tweet_list 变量中将附加一个 tweepy 状态对象。我需要找到解决方法，但上述问题已经解决。

关于python - 使用 Rest API 和 Tweepy 从推文下载完整的 JSON 数据，通过推文 ID 查询，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/21996982/

python - 使用 Rest API 和 Tweepy 从推文下载完整的 JSON 数据，通过推文 ID 查询

上一篇：Python 模块范围

下一篇：python - 修改 Flask-alchemy 中的只读元组