python - 为什么我的 Python 机器人有时会发布过多的帖子？

我编写了一个机器人，可以从 Reddit 获取帖子并将其发布到 Twitter 帐户上。但有时 - 我不知道为什么 - 它会连续发布两次，而不是每 3 小时发布一次。我怀疑这是因为我做了这样的事情:

do stuff:
    if stuff doesnt already exist:
        do other stuff
    else:
        do stuff

我真的认为这是不好的做法，但我不知道我还能如何让它在无限循环中运行，但仍然尝试获取以前未发布过的帖子。

我的代码中有两点，我在检查时“重新运行”整个代码。一种是当从 reddit 获取的帖子不是图像时，另一种是当获取的帖子之前已经发布过(并存储在 json 文件中以进行精确检查)。

我希望有人能理解我的意思，提前致谢。

import time
import tweepy
import datetime
import praw
import urllib.request
import os
import json


def Mainbot():
    reddit = praw.Reddit(client_id='X',
                          client_secret='X',
                          user_agent='RedditFetchBot by FlyingThunder')

    def Tweet(postinfo):
        auth = tweepy.OAuthHandler("X", "X")
        auth.set_access_token("X", "X")
        api = tweepy.API(auth)
        try:
            api.update_with_media("local-filename.jpg", postinfo)
        except:
            print("not a file post")
            Mainbot()            #check 1


    post = reddit.subreddit('okbrudimongo').random()
    x = post.id

    with open('data.json', 'r') as e:
        eread = e.read()
        if x not in eread:
            with open('data.json', 'a') as f:
                json.dump(x, f)
                f.close()
                e.close()
        else:
            e.close()
            print("already posted")
            Mainbot()      #check 2

    print(post.url + " " + post.title)
    urllib.request.urlretrieve(post.url, "local-filename.jpg")
    Tweet(postinfo=post.title+" (https://www.reddit.com" + post.permalink+")")
    try:
        time.sleep(5)
        os.remove("local-filename.jpg")
    except:
        print("Datei nicht vorhanden")

def loop():
    time.sleep(1800)
    print("still running")
    print(datetime.datetime.now())

while True:
    Mainbot()
    loop()
    loop()
    loop()
    loop()
    loop()
    loop()

顺便说一句，这是它返回的内容 - 我进行了打印检查以查看出了什么问题，在这里您可以看到它发布两次时所说的内容

still running
2019-09-24 13:27:23.437152
still running
2019-09-24 13:57:23.437595
already posted
https://i.redd.it/xw38s1qrmlh31.jpg Führ Samstag bai ihm
https://i.redd.it/nnaxll9gjwf31.jpg Sorri Mamer
still running
2019-09-24 14:27:39.913651
still running
2019-09-24 14:57:39.913949
still running
2019-09-24 15:27:39.914013

最佳答案

这里有很多东西需要解压。

if x not in eread:
    ...
else:
    ...
    Mainbot()    # <--- this line

在上面的代码片段中，您检查 post.id 是否已在您的文件中。如果是，您再次调用函数 Mainbot() 这意味着它还有一次发布推文的机会。

但是，这一行

Tweet(postinfo=post.title+" (https://www.reddit.com" + post.permalink+")")

如果您进行 if-else 检查，则在外部发生，这意味着无论 post.id 是否在您的文件中，它都会发布一条推文。

我还想解决您循环机器人的方法。您对递归的使用导致了重复发布问题，并且如果连续的多个帖子最终出现在上面列出的“else”分支中，那么从技术上讲，可能会一次递归地循环发布许多推文。

此外，如果您使用python with open(...) as f:，则不需要调用python f.close()

这是我想出的一个解决方案，应该可以解决您的问题并且不使用递归:

import time
import tweepy
import datetime
import praw
import urllib.request
import os
import json

def initBot():
# this function logs into your reddit and twitter accounts
# and returns their instances

    reddit = praw.Reddit(client_id='XXXX',
                          client_secret='XXXX',
                          user_agent='RedditFetchBot by FlyingThunder')
    auth = tweepy.OAuthHandler("XXXX", "XXXX")
    auth.set_access_token("XXXX",
                          "XXXX")
    twitter = tweepy.API(auth)

    return reddit, twitter

def Tweet(post):
# this function simply tries to post a tweet
    postinfo = post.title + " (https://www.reddit.com" + post.permalink + ")"
    try:
        twitter.update_with_media("local-filename.jpg", postinfo)
    except:
        print("not a file post"+post.permalink)

def Mainbot():
    while True:
        with open('data.json', 'r+') as e:  # 'r+' let's you read and write to a file
            eread = e.read()

            # This section loops until it finds a reddit submission
            # that's not in your file
            post = reddit.subreddit('okbrudimongo').random()
            x = post.id
            while x in eread:
                post = reddit.subreddit('okbrudimongo').random()
                x = post.id

            # add the post.id to the file
            json.dump(x, e)
            print(post.url + " " + post.title)

            # Get and tweet image
            urllib.request.urlretrieve(post.url, "local-filename.jpg")
            Tweet(post)

            # Remove image file
            try:
                time.sleep(5)
                os.remove("local-filename.jpg")
            except:
                print("Datei nicht vorhanden")

        # sleep for a total of three hours, but report status every 30 minutes
        for i in range(6):
            time.sleep(1800)
            print("still running")
            print(datetime.datetime.now())

if __name__ == "__main__":

    reddit, twitter = initBot()
    Mainbot()

我还没有测试过这个，因为我没有 Twitter key 。

关于python - 为什么我的 Python 机器人有时会发布过多的帖子？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58082497/

python - 为什么我的 Python 机器人有时会发布过多的帖子？

上一篇：python - 我在输出中得到的模式末尾有额外的星号，这是不需要的

下一篇：python - 如何测试需要文件的点击命令