python - 用python分析PRAW中的评论的问题

标签 python parsing comments

所以,我正在制作一个小型的 reddit 机器人,它只是在评论中搜索一个术语,但我得到了奇怪的结果。我是 python 的新手,所以这段代码可能有点困惑和不成熟。

#! /usr/bin/python

import praw

import pprint

user_agent = ("simple praw script for searching post terms in comments by /u/shadowfire452")
reddit = praw.Reddit(user_agent = user_agent)
reddit.login()
v_fixed = []
subreddit = reddit.get_subreddit('politics' and 'worldnews')

for submission in subreddit.get_hot(limit = 100):
    title = submission.title
    if " " in title.lower(): 
        v_fixed.append(title)
print "The following %d posts might not make much sense ..." % (len(v_fixed))
for fixed in v_fixed:
    print "\t%s" % (fixed)




flat_comment_generator = praw.helpers.flatten_tree(submission.comments)

for comment in flat_comment_generator:
    if "you" in comment.body:
        a = []
        commentz = comment.body
        a.append(commentz)
        print comment.body
        print ("I found %s comments with 'you' in it out of 100 posts") % (len(a))
    else:
           print "I found no comments with 'you' in it"

当我运行它时,我得到:

I found 1 comments with ' ' in it out of 100 posts
I found no comments with ' ' in it
I found no comments with ' ' in it
I found no comments with ' ' in it
I found no comments with ' ' in it

显然这是一个问题,因为我得到了相互矛盾的答案,并且对 1 个请求有 5 个回复。

最佳答案

import praw # simple interface to the reddit API, also handles rate limiting of requests
import re
from collections import deque 
from time import sleep

USERNAME  = ""
PASSWORD  = ""
USERAGENT = "bot/1.0 by USERNAME"

r = praw.Reddit(USERAGENT)
r.login(USERNAME, PASSWORD) # necessary if your bot will talk to people

cache = deque(maxlen = 200) # To make sure we don't duplicate effort

# Set of words to find in the comment body.
# I have changed this to a set.
words = set(["these", "are", "the", "words", "to", "find"])

def word_check(comment_body, words):
    # Will split the comment_body into individual words and check each for membership in words

    # Split comment into words
    comment_words = comment_body.split()

    # Check each word for hot word and return True if found
    for word in comment_words:
        if word in words:
            return True

    # Return false if no words in words
    return False

def bot_action(comment, reply):
    print "Body:", comment.body
    print "Found word in:", comment.subreddit.display_name
    comment.reply(reply)

# Loop through comments
running = True
while running:
    all = r.get_comments('politics', limit = None)
    for comment in all:
        # if comment id exists in cache, break
        if comment.id in cache:
            break
        cache.append(comment.id) # cache already found comment id
        # execute method for comment body and hotword(s)
        if word_check(comment.body, words):
            try:
                # action the bot to reply
                bot_action(comment, "Hello world")
            except KeyboardInterrupt:
                running = False
            except praw.errors.APIException, e:
                print "[ERROR]:", e
                print "Sleeping for 30 seconds"
                sleep(30)
            except Exception, e: # In reality you don't want to just catch everything like this, but this is toy code.
                print "[ERROR]:", e
                print "Blindly handling error"
                continue

关于python - 用python分析PRAW中的评论的问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21261756/

相关文章:

python - 导入错误: bad magic number in 'dateparser' : b'\x03\xf3\r\n'

python - Django Celery AbortableTask 使用

java - 在java中解析csv文件时出现在单独行中的单词

java - 匹配不在嵌套括号中的字符

java - 正则表达式 - 具有多个句点的文件名

Emacs 中类似 Eclipse 的行注释

文档样式 : how do you differentiate variable names from the rest of the text within a comment?

Python LDAP 旧密码仍然有效

python - 如何访问具有特定 Class 属性值的 Class 实例?

java - 查找长篇评论