python - 从 python 3 中的用户输入中计算二元组?

标签 python python-3.x

我被卡住了,需要一点指导。我正在努力使用 Grok Learning 自己学习 Python。下面是问题和示例输出以及我在代码中的位置。我很感激能帮助我解决这个问题的任何提示。

In linguistics, a bigram is a pair of adjacent words in a sentence. The sentence "The big red ball." has three bigrams: The big, big red, and red ball.

Write a program to read in multiple lines of input from the user, where each line is a space-separated sentence of words. Your program should then count up how many times each of the bigrams occur across all input sentences. The bigrams should be treated in a case insensitive manner by converting the input lines to lowercase. Once the user stops entering input, your program should print out each of the bigrams that appear more than once, along with their corresponding frequencies. For example:

Line: The big red ball
Line: The big red ball is near the big red box
Line: I am near the box
Line: 
near the: 2
red ball: 2
the big: 3
big red: 3

我的代码还没走多远,真的卡住了。但这是我所在的位置:

words = set()
line = input("Line: ")
while line != '':
  words.add(line)
  line = input("Line: ")

我这样做对吗?尽量不要导入任何模块,只使用内置功能。<​​/p>

谢谢, 杰夫

最佳答案

让我们从接收句子(带标点符号)并返回找到的所有小写双字母列表的函数开始。

因此,我们首先需要从句子中去除所有非字母数字,将所有字母转换为对应的小写字母,然后将句子按空格拆分为单词列表:

import re

def bigrams(sentence):
    text = re.sub('\W', ' ', sentence.lower())
    words = text.split()
    return zip(words, words[1:])

我们将使用标准(内置)re用于基于正则表达式用空格替换非字母数字的包,以及用于配对连续单词的内置 zip 函数。 (我们将单词列表与同一个列表配对,但移动了一个元素。)

现在我们可以测试它了:

>>> bigrams("The big red ball")
[('the', 'big'), ('big', 'red'), ('red', 'ball')]
>>> bigrams("THE big, red, ball.")
[('the', 'big'), ('big', 'red'), ('red', 'ball')]
>>> bigrams(" THE  big,red,ball!!?")
[('the', 'big'), ('big', 'red'), ('red', 'ball')]

接下来,为了计算在每个句子中找到的二元组,您可以使用 collections.Counter .

例如,像这样:

from collections import Counter

counts = Counter()
for line in ["The big red ball", "The big red ball is near the big red box", "I am near the box"]:
    counts.update(bigrams(line))

我们得到:

>>> Counter({('the', 'big'): 3, ('big', 'red'): 3, ('red', 'ball'): 2, ('near', 'the'): 2, ('red', 'box'): 1, ('i', 'am'): 1, ('the', 'box'): 1, ('ball', 'is'): 1, ('am', 'near'): 1, ('is', 'near'): 1})

现在我们只需要打印出现不止一次的那些:

for bigr, cnt in counts.items():
    if cnt > 1:
        print("{0[0]} {0[1]}: {1}".format(bigr, cnt))

全部放在一起,用一个循环供用户输入,而不是固定列表:

import re
from collections import Counter

def bigrams(sentence):
    text = re.sub('\W', ' ', sentence.lower())
    words = text.split()
    return zip(words, words[1:])

counts = Counter()
while True:
    line = input("Line: ")
    if not line:
        break
    counts.update(bigrams(line))

for bigr, cnt in counts.items():
    if cnt > 1:
        print("{0[0]} {0[1]}: {1}".format(bigr, cnt))

输出:

Line: The big red ball
Line: The big red ball is near the big red box
Line: I am near the box
Line: 
near the: 2
red ball: 2
big red: 3
the big: 3

关于python - 从 python 3 中的用户输入中计算二元组?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45638131/

相关文章:

python - 使用 Python 在 .txt 文件的每一行中 append 特定字符串

python - openerp 中的 rml 报告

python - 在 Python 中使用 Pandas 进行特征工程,每次计算使用多行

python - aiohttp web.response 正文为 json

python-3.x - string.replace ,python 正在删除句子的其余部分

python - 导入错误 : cannot import name '_ColumnEntity' Ubuntu20. 10

python - 如何从 Python 列表中删除偶数?

python - 使用 np.matrix 数组的矩阵乘法和使用 np.arrays 的 dot()/tensor() 有什么区别?

python - 在 Google ML 上运行导出的 Inception : Expected float32 got 'str'

python - Django 表格 : Make some text sent by form to email appear bold