python - NLTK panlex_lite 给我错误

标签 python nlp nltk

我正在尝试使用 NLTK 在 Python 中进行 NLP 学习。

某些名为“panlex_lite”的包一直给我错误,所以我尝试使用以下方法:

import nltk
nltk.download('all', halt_on_error = False)

它给了我以下错误:

[nltk_data]    | Downloading package panlex_lite to
[nltk_data]    |     /Users/Harshil/nltk_data...
[nltk_data]    |   Unzipping corpora/panlex_lite.zip.
Traceback (most recent call last):
  File "<pyshell#1>", line 1, in <module>
  nltk.download('all', halt_on_error = False)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 664, in download
for msg in self.incr_download(info_or_id, download_dir, force):
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 543, in incr_download
for msg in self.incr_download(info.children, download_dir, force):
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 529, in incr_download
for msg in self._download_list(info_or_id, download_dir, force):
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 572, in _download_list
for msg in self.incr_download(item, download_dir, force):
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 549, in incr_download
for msg in self._download_package(info, download_dir, force):
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 638, in _download_package
for msg in _unzip_iter(filepath, zipdir, verbose=False):
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/downloader.py", line 2039, in _unzip_iter
outfile.write(contents)
OSError: [Errno 22] Invalid argument

要解决这个问题吗?我试过使用“halt_on_error = False”方法,但它仍然给我错误。

谢谢。

最佳答案

这是一个“肮脏的”技巧:

$ rm /Users/Harshil/nltk_data/corpora/panlex_lite.zip
$ rm -r /Users/Harshil/nltk_data/corpora/panlex_lite
$ python

>>> import nltk
>>> dler = nltk.downloader.Downloader()
>>> dler._update_index()
>>> dler._status_cache['panlex_lite'] = 'installed' # Trick the index to treat panlex_lite as it's already installed.
>>> dler.download('all')

另外,试试earthy:

pip install earthy

长话短说:

import earthy
path_to_nltk_data = '/home/yourusername/nltk_data/'
earthy.download('all', path_to_nltk_data) # Excludes the third party (non-NLTK) packages.

独家下载 panlex_lite:

import earthy
earthy.download('panlex_lite', path_to_nltk_data)

要下载所有非本地托管在 nltk_data github 上的第三方数据集:

import earthy
earthy.download('third_party', path_to_nltk_data')

关于python - NLTK panlex_lite 给我错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38128935/

相关文章:

python - python中减去两个数组,类似于matlab中的bsxfun

apache-spark - John Snow Labs 基于 Apache Spark 构建的 NLP 库是否支持 Java

python - 使用 NLTK 库对一堆 txt 文件使用 TF-IDF 进行标记化和词形还原

python - 如何获取 CFG 语法词典中没有的单词?

python-3.x - 需要构建自定义 NER 的方法,以便从任何格式的工资单中提取以下关键字

Python:NLTK 或其他包中的布朗聚类?

Python 无法创建 NewWriteableFile(tensorflow.python.framework.errors_impl.NotFoundError : Failed to create a NewWriteableFile: )

python - 我刚收到一条以前从未见过的大型错误消息,它是什么意思?

python - 将包含列表项的字典展开为字典对列表

Python代码处理文本文档时不停顿