python - 我如何使用字典理解来计算文档中每个单词的出现次数

我有一个充满文本的 python 列表列表。这就像从每个文档中设置单词。因此，对于每个文档，我都有一个列表，然后是所有文档的列表。

所有列表只包含唯一的单词。 我的目的是计算整个文档中每个单词的出现次数。我能够使用以下代码成功地做到这一点:

for x in texts_list:
    for l in x:
        if l in term_appearance:
            term_appearance[l] += 1
        else:
            term_appearance[l] = 1

但我想使用字典理解来做同样的事情。这是第一次，我正在尝试编写字典理解并使用 stackoverflow 中以前的现有帖子，我已经能够编写以下内容:

from collections import defaultdict
term_appearance = defaultdict(int)

{{term_appearance[l] : term_appearance[l] + 1 if l else term_appearance[l] : 1 for l in x} for x in texts_list}

上一篇文章供引用:

Simple syntax error in Python if else dict comprehension

如上文所述，我还使用了以下代码:

{{l : term_appearance[l] + 1 if l else 1 for l in x} for x in texts_list}

上面的代码成功生成了空列表，但最终抛出了以下回溯:

[]

[]

[]

[]

Traceback (most recent call last):

  File "term_count_fltr.py", line 28, in <module>

    {{l : term_appearance[l] + 1 if l else 1 for l in x} for x in texts_list}
  File "term_count_fltr.py", line 28, in <setcomp>

    {{l : term_appearance[l] + 1 if l else 1 for l in x} for x in texts_list}

TypeError: unhashable type: 'dict'

如果能帮助我改进目前的理解，我们将不胜感激。

看了上面的错误，我也试过了

[{l : term_appearance[l] + 1 if l else 1 for l in x} for x in texts_list]

这运行没有任何错误，但输出仅为空列表。

最佳答案

就像在其他答案中解释的那样，问题是字典理解会创建一个新字典，因此在创建新字典之前您不会引用它。你无法对你正在做的事情进行字典理解。

鉴于此，您正在做的是尝试重新实现 collections.Counter 已经完成的工作.您可以简单地使用 Counter .示例 -

from collections import Counter
term_appearance = Counter()
for x in texts_list:
    term_appearance.update(x)

演示 -

>>> l = [[1,2,3],[2,3,1],[5,4,2],[1,1,3]]
>>> from collections import Counter
>>> term_appearance = Counter()
>>> for x in l:
...     term_appearance.update(x)
...
>>> term_appearance
Counter({1: 4, 2: 3, 3: 3, 4: 1, 5: 1})

如果你真的想在某种理解中做到这一点，你可以这样做:

from collections import Counter
term_appearance = Counter()
[term_appearance.update(x) for x in texts_list]

演示 -

>>> l = [[1,2,3],[2,3,1],[5,4,2],[1,1,3]]
>>> from collections import Counter
>>> term_appearance = Counter()
>>> [term_appearance.update(x) for x in l]
[None, None, None, None]
>>> term_appearance
Counter({1: 4, 2: 3, 3: 3, 4: 1, 5: 1})

输出 [None, None, None, None]如果您在脚本中将其作为 python <script> 运行，则来自导致该列表的列表理解(因为这是交互式运行的) ，该输出将被简单地丢弃。

您还可以使用 itertools.chain.from_iterable() 从您的 text_lists 创建一个扁平化列表，然后将其用于计数器。示例:

from collections import Counter
from itertools import chain
term_appearance = Counter(chain.from_iterable(texts_list))

演示 -

>>> from collections import Counter
>>> from itertools import chain
>>> term_appearance = Counter(chain.from_iterable(l))
>>> term_appearance
Counter({1: 4, 2: 3, 3: 3, 4: 1, 5: 1})

此外，您的原始代码中的另一个问题 -

{{term_appearance[l] : term_appearance[l] + 1 if l else term_appearance[l] : 1 for l in x} for x in texts_list}

这实际上是一个集合推导式，里面嵌套了一个字典推导式。

这就是您收到错误的原因 - TypeError: unhashable type: 'dict' .因为在第一次运行字典理解并创建一个 dict 之后，它正试图将其添加到 set 中.但是字典不可哈希，因此会出现错误。

关于python - 我如何使用字典理解来计算文档中每个单词的出现次数，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/33005949/

python - 我如何使用字典理解来计算文档中每个单词的出现次数

上一篇：python - 对 Python 最小测验感到困惑

下一篇：python - 使用 virtualenvwrapper (& virtualenv) 改变默认的 python 版本