python - 如何根据名称将文本文件中的单词添加到字典中？

所以我有一个文本文件，其中包含罗密欧与朱丽叶戏剧中第一幕的剧本，我想计算某人说一个词的次数。

文中有 3 个人发言:Gregory、Sampson 和 Abraham。

基本上，我想为三位演讲者中的每一位制作 3 部不同的词典(如果这是最好的方法吗？)。用人们分别说的词填充字典，然后计算他们在整个脚本中说每个词的次数。

我该怎么做呢？我想我可以算出字数，但我对如何区分谁说了什么并将其放入每个人的 3 部不同词典感到有点困惑。

我的输出应该是这样的(这是不正确的，只是一个例子):

Gregory - 
25: the
15: a
5: from
3: while
1: hello
etc

其中数字是文件中单词的出现频率。

现在我编写了读取文本文件、去除标点符号并将文本编译成列表的代码。我也不想使用任何外部模块，我想用老式的方式来学习，谢谢。

您不必发布确切的代码，只需解释我需要做什么，希望我能弄明白。我正在使用 Python 3。

最佳答案

import collections
import string
c = collections.defaultdict(collections.Counter)
speaker = None

with open('/tmp/spam.txt') as f:
  for line in f:
    if not line.strip():
      # we're on an empty line, the last guy has finished blabbing
      speaker = None
      continue
    if line.count(' ') == 0 and line.strip().endswith(':'):
      # a new guy is talking now, you might want to refine this event
      speaker = line.strip()[:-1]
      continue
    c[speaker].update(x.strip(string.punctuation).lower() for x in line.split())

示例输出:

In [1]: run /tmp/spam.py

In [2]: c.keys()
Out[2]: [None, 'Abraham', 'Gregory', 'Sampson']

In [3]: c['Gregory'].most_common(10)
Out[3]: 
[('the', 7),
 ('thou', 6),
 ('to', 6),
 ('of', 4),
 ('and', 4),
 ('art', 3),
 ('is', 3),
 ('it', 3),
 ('no', 3),
 ('i', 3)]

关于python - 如何根据名称将文本文件中的单词添加到字典中？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/13003575/

python - 如何根据名称将文本文件中的单词添加到字典中？

上一篇：python - 多语言项目的Django数据模型

下一篇： python 壳: pickle entire state