python - PRAW/Tweepy 过滤关键字

标签 python python-3.x ubuntu tweepy praw

所以我在过滤我的虾的结果时遇到了一些问题。我想在结果中排​​除诸如([request]、[off topic] 或 [nsfw])之类的关键字。我不想在 tweepy 上发布类似 praw 结果中的帖子。我正在寻找文档,但在 PRAW 网站上找不到任何内容。

这是我的代码:

def poster():
conn = sqlite3.connect('jb_id.db')
c = conn.cursor()
toTweet = []
for submission in reddit.subreddit(SUB).hot(limit=POST_LIMIT):
    if not submission.stickied and len(submission.title) < 255:    
        url = submission.shortlink
        title = submission.title
        udate = time.strftime("%Y-%m-%d %X",time.gmtime(submission.created_utc))

        try:
            # This keeps a record of the posts in a the database
            c.execute("INSERT INTO posts (id, title, udate) VALUES (?, ?, ?)",
            (url, title, udate))
            conn.commit()


            message = title + " " + url
            print(message)
            toTweet.append(message)

        except sqlite3.IntegrityError:
            # This means the post was already tweeted and is ignored
            print("Duplicate", url)

c.close()
conn.close()
tweeter(toTweet)

如您所见,我排除了超过 255 个字符的标签和标题。我想知道是否有一种方法可以用我上面提到的关于 praw 的结果的关键字来过滤 reddit 上的帖子。谢谢!

最佳答案

列出不应出现在提交标题中的关键字

bad_keywords = "[request]", "[off topic]", "[nsfw]"

如果提交标题包含列表中的项目,则跳过循环

title_lowercase = submission.title.lower()
if any(x in title_lowercase for x in bad_keywords):
    continue

我会将其与您的其他排除项结合使用以减少缩进并使其更具可读性

bad_title = any(x in title_lowercase for x in bad_keywords)
skip_submission = submission.stickied and len(submission.title) > 255 and bad_title
if skip_submission:
    continue

完整的解决方案

def poster():
conn = sqlite3.connect('jb_id.db')
c = conn.cursor()
toTweet = []

bad_keywords = "[request]", "[off topic]", "[nsfw]"

for submission in reddit.subreddit(SUB).hot(limit=POST_LIMIT):
    title = submission.title
    title_lowercase = title.lower()

    bad_title = any(x in title_lowercase for x in bad_keywords)
    skip_submission = submission.stickied and len(submission.title) > 255 and bad_title

    if skip_submission:
        continue

    url = submission.shortlink
    udate = time.strftime("%Y-%m-%d %X",time.gmtime(submission.created_utc))

    try:
        # This keeps a record of the posts in a the database
        c.execute("INSERT INTO posts (id, title, udate) VALUES (?, ?, ?)",
        (url, title, udate))
        conn.commit()


        message = title + " " + url
        print(message)
        toTweet.append(message)

    except sqlite3.IntegrityError:
        # This means the post was already tweeted and is ignored
        print("Duplicate", url)

c.close()
conn.close()
tweeter(toTweet)

关于python - PRAW/Tweepy 过滤关键字,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58799355/

相关文章:

javascript - 如何在 Django 中 AJAX 视频上传

python - 类型错误 : unsupported operand type(s) for +: 'float' and 'str'

php - 安排推文 PHP 和 Cron

python - 使用 apply lambda 和 str 从 python 中的列获取子字符串

python - 将 Python 中的日期与日期时间进行比较

python - python中的列表索引比较

python - 使用 Tornado 上传文件

linux - 如何解决 Ubuntu 14.04 中的 apt-get 问题?

macos - 将 IntelliJ 与 X11 转发结合使用

python - 测试两个 numpy 数组是否(接近)相等,包括形状