python - 用 "SAD"或 "HAPPY"替换表情符号的代码无法正常工作

标签 python nltk text-processing

所以我想用“HAPPY”替换所有快乐的表情符号,反之亦然,用“SAD”替换文本文件的悲伤表情符号。但代码无法正常工作。虽然它检测到笑脸(截至目前:-)),但在下面的示例中,它没有用文本替换表情符号,它只是附加文本,而且由于我似乎不明白的原因,它还附加了两次文本。

dict_sad={":-(":"SAD", ":(":"SAD", ":-|":"SAD",  ";-(":"SAD", ";-<":"SAD", "|-{":"SAD"}
dict_happy={":-)":"HAPPY",":)":"HAPPY", ":o)":"HAPPY",":-}":"HAPPY",";-}":"HAPPY",":->":"HAPPY",";-)":"HAPPY"}

#THE INPUT TEXT#
a="guys beautifully done :-)" 

for i in a.split():
    for j in dict_happy.keys():
        if set(j).issubset(set(i)):
            print "HAPPY"
            continue
    for k in dict_sad.keys():
        if set(k).issubset(set(i)):
            print "SAD"
            continue
    if str(i)==i.decode('utf-8','replace'):
       print i

输入文本

a="guys beautifully done :-)"              

输出(“HAPPY”来了两次,表情也没有消失)

guys
-
beautifully
done
HAPPY
HAPPY
:-)

预期输出

guys
beautifully
done
HAPPY

最佳答案

您正在将每个单词每个表情符号变成一组;这意味着您正在寻找单个字符的重叠。您可能最多希望使用完全匹配:

for i in a.split():
    for j in dict_happy:
        if j == i:
            print "HAPPY"
            continue
    for k in dict_sad:
        if k == i:
            print "SAD"
            continue

您可以直接迭代字典,无需在那里调用.keys()。您实际上并没有使用字典值;而是使用了字典值。你可以这样做:

for word in a.split():
    if word in dict_happy:
        print "HAPPY"
    if word in dict_sad:
        print "SAD"

然后也许使用集合而不是字典。这可以简化为:

words = set(a.split())
if dict_happy.viewkeys() & words:
    print "HAPPY"
if dict_sad.viewkeys() & words:
    print "SAD"

使用dictionary view在按键上作为一组。尽管如此,使用集合仍然会更好:

sad_emoticons = {":-(", ":(", ":-|", ";-(", ";-<", "|-{"}
happy_emoticons = {":-)", ":)", ":o)", ":-}", ";-}", ":->", ";-)"}

words = set(a.split())
if sad_emoticons & words:
    print "HAPPY"
if happy_emoticons & words:
    print "SAD"

如果您想从文本中删除表情符号,则必须过滤单词:

for word in a.split():
    if word in dict_happy:
        print "HAPPY"
    elif word in dict_sad:
        print "SAD"
    else:
        print word

或者更好的是,结合两个字典并使用 dict.get():

emoticons = {
    ":-(": "SAD", ":(": "SAD", ":-|": "SAD", 
    ";-(": "SAD", ";-<": "SAD", "|-{": "SAD",
    ":-)": "HAPPY",":)": "HAPPY", ":o)": "HAPPY",
    ":-}": "HAPPY", ";-}": "HAPPY", ":->": "HAPPY",
    ";-)": "HAPPY"
}

for word in a.split():
    print emoticons.get(word, word)

这里我传入当前单词作为查找键和默认值;如果当前单词不是表情符号,则打印单词本身,否则打印单词 SADHAPPY

关于python - 用 "SAD"或 "HAPPY"替换表情符号的代码无法正常工作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26969747/

相关文章:

python - Python 中 'for a[-1] in a' 和 'for a in a' 的区别?

python - 在两列中格式化 tkinter 消息框中的文本

python - NLTK FreqDist 使用 pandas 到表

javascript - 自然语言处理数据库查询

regex - 在每行的开头和结尾添加字符串和标签

python - 多个文件比较

python - 将 '' praw.Reddit'' 分配给变量时出现 KeyError

python - 如何在 NLTK 中对二元语言模型进行单词级别的 Kneser-Ney 平滑?

javascript - 使用 Javascript 将链接更改为关键字

mysql - 使用 SQL 确定文本字段的字数统计