python - 如何从 GZIP 文件中读取 NLTK 语法？

如何将压缩语法文件加载到 NLTK 中？我可以很好地加载未压缩的语法文件:

import nltk
parser = nltk.load_parser('grammar.fcfg')

但是由于我的语法文件接近 100MB，我想加载更小的压缩版本，只有 1MB，但是失败了:

import nltk
parser = nltk.load_parser('grammar.tar.gz')

ValueError: Could not determine format for file:///grammar.tar.gz based on its file extension; use the "format" argument to specify the format explicitly.

不幸的是，查看nltk.data.FORMATS并没有列出任何压缩格式。

最佳答案

通过检查 the source of load_parser 可以看到，它relies上ntlk.data.find打开 file:/// URL。

该函数自动检测 GZipped 数据 by checking if the path ends with .gz和 reads it与 gzip.GZipFile .

但是，要实现此目的，数据应直接使用 gzip 打包，而不是通过 tar 或其他方式打包。

关于python - 如何从 GZIP 文件中读取 NLTK 语法？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43459949/

上一篇：python - 计算损失时检查标签( tensorflow )

下一篇：python - 如何自动将时间添加到 django 模型中的日期时间字段？

相关文章：

NLTK:从 Synset ID 获取单词

python - 泊松模拟未按预期工作？

python - 在 Python 上捕获与正则表达式的匹配并将捕获的字符串值分配给变量

python - 如何使用 pygame 找到 python 中文件的确切位置？

python - 扭曲:如果回调出错，则没有异常跟踪

python - NLTK Maxent 中的 Set_Weights？

python - 保存和加载测试以另一种方法对 NLTK 中的朴素贝叶斯分类器进行分类

Python 决策树分类器 batch_prob_classify 函数

python - 为什么Python的sqlite3模块不尊重位置参数的顺序？

python - python中的Lambda参数函数