python - 使用 tweepy 从用户时间轴中获取重复的推文

我正在尝试使用 tweepy 从帐户列表中提取推文。我能够获取推文，但我从一个帐户中获取了大量重复的推文。在某些情况下，我提取了 400 条推文，其中大约有一半是重复的。

我查看了 Twitter 上的帐户，并确认这些帐户不仅仅是一遍又一遍地发布相同的内容。我还确认他们没有一百多条转发可以解释这一点。当我查看重复的实际推文对象时，一切都完全相同。推文 ID 相同。当时创建的都是一样的。转发数量没有差异。 @提及和主题标签是相同的。我没有看到任何区别。我想这可能是我的循环中的问题，但我尝试的所有操作都会产生相同的结果。

有什么想法吗？我不想只进行重复数据删除，因为这样我从某些帐户中收到的推文就会大大减少。

# A list of the accounts I want tweets from
friendslist = ["SomeAccount", "SomeOtherAccount"] 

# Where I store the tweet objects
friendstweets = []

# Loop that cycles through my list of accounts to add tweets to friendstweets
for f in friendslist:
    num_needed = 400 # The number of tweets I want from each account
    temp_list = []
    last_id = -1 # id of last tweet seen
    while len(temp_list) < num_needed:
        try:
          new_tweets = api.user_timeline(screen_name = f, count = 400, include_rts = True)
        except tweepy.TweepError as e:
            print("Error", e)
            break
        except StopIteration:
            break
        else:
            if not new_tweets:
              print("Could not find any more tweets!")
              break
        friendstweets.extend(new_tweets) 
        temp_list.extend(new_tweets)
        last_id = new_tweets[-1].id
    print('Friend '+f+' complete.')

最佳答案

您的问题出在这一行:while len(temp_list) < num_needed: 。基本上，您所做的就是为每个用户获取相同的推文，直到获取超过 400 条推文。

我建议的修复方法是删除 while循环并更改从 400 获取的推文数量至num_nneded :

new_tweets = api.user_timeline(screen_name = f, count = num_needed, include_rts = True)

希望它能按预期工作。

关于python - 使用 tweepy 从用户时间轴中获取重复的推文，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57376132/

python - 使用 tweepy 从用户时间轴中获取重复的推文

上一篇：python - 识别数据框中增加的特征

下一篇：python - 如何分别处理具有多个测量时间列和多个测量变量的数据帧