python - 值错误: seek of closed file Working on PyPDF2 and getting this error

我正在尝试从 pdf 文件中获取文本。下面是代码:

from PyPDF2 import PdfFileReader
with open('HTTP_Book.pdf', 'rb') as file:
    pdf = PdfFileReader(file)

page = pdf.getPage(1)
#print(dir(page))
print(page.extractText())

这给了我错误

ValueError: seek of closed file

我只是将代码放在 with 语句下，并且工作正常。我的问题是:为什么会这样？我已经将信息存储在“pdf”对象中，因此我应该能够在 block 之外访问它。

最佳答案

PdfFileReader期待一个可寻求的、开放的、 Steam 的。它不会将整个文件加载到内存中，因此您必须保持它打开才能运行方法，例如 getPage 。您关于创建读取器自动读取整个文件的假设是不正确的。

一个with语句在 context manager 上运行，例如文件。当with结束时，上下文管理器的__exit__方法被调用。在这种情况下，它会关闭您的 PdfFildReader 尝试用来获取第二页的文件句柄。

正如您所发现的，正确的步骤是在关闭文件之前从 PDF 中读取您必须阅读的内容。当且仅当您的程序需要打开 PDF 直到最后，您可以将文件名直接传递给 PdfFileReader。不过，之后没有(有记录的)方法可以关闭文件，因此我建议您使用原来的方法:

from PyPDF2 import PdfFileReader
with open('HTTP_Book.pdf', 'rb') as file:
    pdf = PdfFileReader(file)
    page = pdf.getPage(1)
    print(page.extractText())
# file is closed here, pdf will no longer do its job

关于python - 值错误: seek of closed file Working on PyPDF2 and getting this error，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/55991402/

上一篇：python - XML 到 Pandas 数据框，每个类别有一列

下一篇：python - 如何在numpy或python中用(0,0)<=(x,y)<=(x1,y1)过滤数组(x,y,z)？

Python 将字符串转换为可变格式的日期时间

python - PyPDF2 在打开不安全文件时是否采取任何安全措施？

python - PDF 叠加不起作用

python - 如何更改颜色条上底数和指数的字体大小？

python tkinter滚动条和文本小部件问题

python - python 中的直接列表分配 - 不插入

python - 如何使用 Python 订阅 Websocket API channel ？

python - 合并 PDF，同时保留自定义页码(也称为页面标签)和书签

python - 使用 python 连接到 surfshark vpn