我正在 Azure Synapse 笔记本中试验 NLTK。当我尝试运行 nltk.download('stopwords') 时,出现以下错误:
ValueError: I/O operation on closed file
Traceback (most recent call last):
File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/nltk/downloader.py", line 782, in download
show(msg.message)
File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/nltk/downloader.py", line 775, in show
subsequent_indent=prefix + prefix2 + " " * 4,
File "/mnt/var/hadoop/tmp/nm-local-dir/usercache/trusted-service-user/appcache/application_1616860588116_0001/container_1616860588116_0001_01_000001/tmp/9026485902214290372", line 536, in write
super(UnicodeDecodingStringIO, self).write(s)
ValueError: I/O operation on closed file
如果我尝试只运行 nltk.download() 我会收到以下错误:EOFError: EOF when reading a line
Traceback (most recent call last):
File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/nltk/downloader.py", line 765, in download
self._interactive_download()
File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/nltk/downloader.py", line 1117, in _interactive_download
DownloaderShell(self).run()
File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/nltk/downloader.py", line 1143, in run
user_input = input("Downloader> ").strip()
EOFError: EOF when reading a line
我希望有人可以就可能导致此问题的原因以及如何解决此问题给我一些帮助。我没能找到很多关于从这里去哪里的信息。编辑:
我用来生成错误的代码如下:
import nltk
nltk.download('stopwords')
更新 我最终向 Microsoft 提出了支持请求,这是他们的回应:
Synapse does not support arbitrary shell scripts which is where you would download the related model corpus for NLTK
他们建议我使用 sc.addFile,我最终开始工作了。因此,如果其他人发现了这一点,这就是我所做的。
.
import os
import sys
import nltk
from pyspark import SparkFiles
#add stopwords from storage
sc.addFile('abfss://<file_system>@<account_name>.dfs.core.windows.net/synapse/workspaces/<workspace_name>/nltk_data/',True)
#append path to NLTK
nltk.data.path.append(SparkFiles.getRootDirectory() + '/nltk_data')
nltk.corpus.stopwords.words('english')
谢谢!
最佳答案
我最终向 Microsoft 提出了支持请求,这是他们的回应:
Synapse does not support arbitrary shell scripts which is where you would download the related model corpus for NLTK
他们建议我使用 sc.addFile,我最终开始工作了。因此,如果其他人发现了这一点,这就是我所做的。
....
import os
import sys
import nltk
from pyspark import SparkFiles
#add stopwords from storage
sc.addFile('abfss://<file_system>@<account_name>.dfs.core.windows.net/synapse/workspaces/<workspace_name>/nltk_data/',True)
#append path to NLTK
nltk.data.path.append(SparkFiles.getRootDirectory() + '/nltk_data')
nltk.corpus.stopwords.words('english')
谢谢!
关于python - 在 Azure Synapse 笔记本中运行 nltk.download ValueError : I/O operation on closed file,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66833402/