python - Tweepy For Loop For IDs 提取，有些不对劲

我有以下代码从一系列用户名中提取id并将其附加到名为new_followers_df的pandas数据帧:

twitter_handles = ["x", "y"]    

## Import New Twitter Followers

new_follower_ids = []
handles = []

for user in twitter_handles:

    while True:

        try:

            for page in tweepy.Cursor(api.followers_ids, screen_name= user).pages():
                new_follower_ids.extend(page)
                for ids in page:
                    handles.append(user)
        except tweepy.TweepError:
            time.sleep(60 * 15)
            continue

        except StopIteration:
            pass
        break

new_followers_df = pd.DataFrame({
    "Handles": handles,
    "Follower_ID": new_follower_ids})

如果用户 x 拥有 75,000 个用户，而用户 y 还有一个75,000，我计算出应该需要我30 分钟 抓取所有用户 X 和 Y 的关注者。

这是因为 API 的限制是每个游标 5000 个 id、每个 session 15 次调用以及之间有 15 分钟的等待.

但是，由于某种原因，该脚本需要更长的时间才能完成。知道我的 for 循环是否有问题吗？是否有可能与:StopIteration有关？

谢谢

最佳答案

可能会发生一些事情。

如果您一直在测试您的程序，您可能在 15 分钟的窗口中使用了其中一些调用来进行测试。
pandas 可能需要一些时间才能将 150,000 个值附加到 Dataframe。
对此并不完全确定，但您可能会使用 page 两次(extend(page) 然后 for ids in page)如果page是一个生成器，则使用两次调用。这只是我的猜测，我可能完全错误。

但是，您可以重新编码以使其工作得更加优雅，并有望减少您所遇到的缓慢时间。

首先，您不必自己处理速率限制。 tweepy 可以在初始化 API 时执行此操作。大概在您的代码中的某个时刻您有以下行:

api = tweepy.API(auth)

如果我们将其更改为:

api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

tweepy 将在您达到速率限制时等待，并会打印一条消息告诉您它正在等待。

一旦你完成了这些，让我们稍微调整一下你的代码:

twitter_handles = ["x", "y"]    

new_follower_ids = []
handles = []

for user in twitter_handles:
    current_user_followers = []
    for page in tweepy.Cursor(api.followers_ids, screen_name=user).pages():
        current_user_followers.extend(page)

    new_follower_ids.extend(current_user_followers)
    handles.extend([user for _ in current_user_followers])

new_followers_df = pd.DataFrame({
    "Handles": handles,
    "Follower_ID": new_follower_ids})

通过在 for 循环中跟踪当前用户的关注者，一旦我们获得了所有新的关注者，我们只需在最后扩展一次 handles 列表即可。由于我们知道该用户有多少个关注者，因此我们可以为每个关注者将 user 附加到 handles 一次。

关于python - Tweepy For Loop For IDs 提取，有些不对劲，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44782487/

python - Tweepy For Loop For IDs 提取，有些不对劲

上一篇：Python/ Pandas : How can I read 7 million records?

下一篇：python - Pandas 数据透视表和 Matplotlib 栏