python - 表格:FileNotFoundError:[Errno 2](但文件路径正确)

标签 python ipython jupyter tabula

问题:

import tabula as tb
import pandas as pd

other = "https://github.com/chezou/tabula-py/raw/master/tests/resources/data.pdf"
dfs = tb.read_pdf(other, stream=True) #this works

file="D:\Favorites\1. Programming\Projects\cell penetrating peptide supplemental.pdf"
tables = tb.read_pdf(file, pages = "all", multiple_tables = True)
tables

输出:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-29-c598474e8fa3> in <module>
      6 
      7 file="D:\Favorites\1. Programming\Projects\cell penetrating peptide supplemental.pdf"
----> 8 tables = tb.read_pdf(file, pages = "all", multiple_tables = True)
      9 tables

~\anaconda3\lib\site-packages\tabula\io.py in read_pdf(input_path, output_format, encoding, java_options, pandas_options, multiple_tables, user_agent, **kwargs)
    312 
    313     if not os.path.exists(path):
--> 314         raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), path)
    315 
    316     if os.path.getsize(path) == 0:

FileNotFoundError: [Errno 2] No such file or directory: 'D:\\Favorites\x01. Programming\\Projects\\cell penetrating peptide supplemental.pdf'

似乎其他遇到此问题的人都没有得到解决。

我遵循的第一个建议是检查该文件是否确实存在。

file=r"D:\Favorites\1. Programming\Projects\cell penetrating peptide supplemental.pdf"

print( os.path.isfile(file))
print(os.path.exists(file))
print(os.path.getsize(file) == 0)

输出:

True
True
False

???????为什么它会引发一个只有在 print(os.path.exists(file)) 为 False 时才会引发的错误?

我尝试了互联网上的一个文件,效果非常好。我尝试读取的文件没有 URL。我无法从浏览器中查看它。我只能选择下载。否则我只是尝试将其 URL 输入到函数中。

更新: 我尝试了建议的解决方案

import tabula as tb
import pandas as pd


tables = tb.read_pdf(r"D:\Favorites\1. Programming\Projects\cell penetrating peptide supplemental.pdf", pages = "all", multiple_tables = True)
tables

得到了这个:

Got stderr: Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNING: Invalid ToUnicode CMap in font PKLNYU+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 4 (33) in font PKLNYU+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 3 (34) in font PKLNYU+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 1 (35) in font PKLNYU+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 2 (36) in font PKLNYU+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNING: Invalid ToUnicode CMap in font FLAXFE+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 2 (33) in font FLAXFE+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 1 (34) in font FLAXFE+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNING: Invalid ToUnicode CMap in font BPOUDD+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 4 (33) in font BPOUDD+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 3 (34) in font BPOUDD+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 1 (35) in font BPOUDD+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 2 (36) in font BPOUDD+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNING: Invalid ToUnicode CMap in font DCUQIG+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 1 (33) in font DCUQIG+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNING: Invalid ToUnicode CMap in font DREOWG+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 1 (33) in font DREOWG+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNING: Invalid ToUnicode CMap in font EWGNLJ+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 2 (33) in font EWGNLJ+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 1 (34) in font EWGNLJ+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNING: Invalid ToUnicode CMap in font PUHGFM+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 2 (33) in font PUHGFM+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 1 (34) in font PUHGFM+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNING: Invalid ToUnicode CMap in font UHIZXI+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 4 (33) in font UHIZXI+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 3 (34) in font UHIZXI+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 1 (35) in font UHIZXI+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 2 (36) in font UHIZXI+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNING: Invalid ToUnicode CMap in font UCENHU+CambriaMath
Jun 28, 2020 11:17:13 AM org.apache.pdfbox.pdmodel.font.PDSimpleFont toUnicode
WARNING: No Unicode mapping for 1 (33) in font UCENHU+CambriaMath

最佳答案

问题在于 tabula-py 有一个在 read_pdf 中调用的 localize_file 函数。 localize_file 将调用 os.path.expanduser 来扩展路径。例如,在类Unix系统中,“~”是用户主目录的别名。因此 os.path.expanduser 将在 Mac OS X 中进行以下扩展

>>> os.path.expanduser("~/Documents")
'/Users/username/Documents'

不幸的是,这个函数还有另一个影响:它把\当作 ANSI 转义码的转义符号,因为它在函数内部调用 os.fspath 。所以如果你运行

>>> os.path.expanduser("\125")
'U'
>>> os.fspath("\125")
'U'

在您的情况下,路径中的 \1 已转义为 \x01,因此 Windows 无法找到这样的目录。为了保持路径不变,请将其作为原始字符串传递,即在其前面放置一个 r ,如下所示

>>> os.path.expanduser(r"\125")
'\\125'

引用文献:

tabula's read_pdf line 311 localize_file is invoked

tabula's localize_file line 72 os.path.expanduser is invoked

Python's expanduser line 293 fspath is invoked

a reference to ANSI escape sequences

关于python - 表格:FileNotFoundError:[Errno 2](但文件路径正确),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62604522/

相关文章:

python - 如何避免 Papermill 中名为 'kernelspec' 的 Keyerror?

Jupyter Notebook 中的 Octave 语法突出显示

python - 我可以使用 ElementTree 获取 XML 文件的完整结构吗?

Python Pip 第一次失败([Errno 2] 没有这样的文件或目录),然后第二次尝试工作 - 为什么?

python - Eclipse 错误 "warning"没有消失

python - Ipython 笔记本 : Open & Edit Files

python - 将注释和图例保留在同一个 seaborn 图中

ipython - 将 IPython 控制台与 IPython 笔记本一起使用

python - iPython 有没有办法在给定数据帧的情况下生成此类图表?

python - 如何部署python脚本?