python - 验证英语文本中 "a"和 "an"的正确使用 - Python

标签 python grammar

<分区>

我想创建一个程序,从文件中读取文本并在“a”和“an”使用不正确时指出。据我所知,一般规则是当下一个单词以元音开头时使用“an”。但它也应该考虑到有一些异常(exception)情况也应该从文件中读取。

有人可以给我一些提示和技巧,告诉我应该如何开始使用它。可能有帮助的功能。

我会很高兴:-)

我是 Python 的新手。

最佳答案

这是一个正确性定义为的解决方案:an 出现在以元音开头的单词之前,否则可以使用 a :

#!/usr/bin/env python
import itertools
import re
import sys

try:
    from future_builtins import map, zip
except ImportError: # Python 3 (or old Python versions)
    map, zip = map, zip
from operator import methodcaller

import nltk  # $ pip install nltk
from nltk.corpus import cmudict  # >>> nltk.download('cmudict')

def starts_with_vowel_sound(word, pronunciations=cmudict.dict()):
    for syllables in pronunciations.get(word, []):
        return syllables[0][-1].isdigit()  # use only the first one

def check_a_an_usage(words):
    # iterate over words pairwise (recipe from itertools)
    #note: ignore Unicode case-folding (`.casefold()`)
    a, b = itertools.tee(map(methodcaller('lower'), words)) 
    next(b, None)
    for a, w in zip(a, b):
        if (a == 'a' or a == 'an') and re.match('\w+$', w): 
            valid = (a == 'an') if starts_with_vowel_sound(w) else (a == 'a')
            yield valid, a, w

#note: you could use nltk to split text in paragraphs,sentences, words
pairs = ((a, w)
         for sentence in sys.stdin.readlines() if sentence.strip() 
         for valid, a, w in check_a_an_usage(nltk.wordpunct_tokenize(sentence))
         if not valid)

print("Invalid indefinite article usage:")
print('\n'.join(map(" ".join, pairs)))

示例输入(每行一个句子)

Validity is defined as `an` comes before a word that starts with a
vowel sound, otherwise `a` may be used.
Like "a house", but "an hour" or "a European" (from @Hyperboreus's comment http://stackoverflow.com/questions/20336524/gramatically-correct-an-english-text-python#comment30353583_20336524 ).
A AcRe, an AcRe, a rhYthM, an rhYthM, a yEarlY, an yEarlY (words from @tchrist's comment http://stackoverflow.com/questions/9505714/python-how-to-prepend-the-string-ub-to-every-pronounced-vowel-in-a-string#comment12037821_9505868 )
We have found a (obviously not optimal) solution." vs. "We have found an obvious solution (from @Hyperboreus answer)
Wait, I will give you an... -- he shouted, but dropped dead before he could utter the last word. (ditto)

Output

Invalid indefinite article usage:
a acre
an rhythm
an yearly

最后一对无效的原因并不明显,参见Why is it “an yearly”?

关于python - 验证英语文本中 "a"和 "an"的正确使用 - Python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20336524/

相关文章:

Python:以编程方式运行 "pip list"

python - 井字游戏和 Minimax AI

python - 根据最终 excel 文件中特定列中的空/空白值删除一行 - Pandas Data frame

c++ - C++标准语法的成员声明

python - 希腊语上下文无关语法

parsing - 解决我的语法中的 shift-reduce 冲突的问题

c++ - 未通过空格分隔的Bison/Yacc解析器将跳过语法- “unexpected $end”

python - tls 中客户端的证书

debugging - ANTLRWorks 调试 - 不同颜色的含义?

python - 如何使用 python 将嵌套的 JSON 数据转换为 CSV?