python - PyPDF2 PdfFileWriter 没有属性流

标签 python pdf pypdf

我正在尝试将 pdf 拆分成页面并将每个页面另存为新的 pdf。我试过this上一个问题的方法没有成功,pypdf2 拆分示例来自 here没有成功。编辑:我可以在我的文件中看到它确实成功写入了第一页,然后创建了第二页 pdf 但它是空的。

这是我要运行的代码:

from PyPDF2 import PdfFileWriter, PdfFileReader

inputpdf = PdfFileReader(open("my_pdf.pdf", "rb"))

for i in range(inputpdf.numPages):
    output = PdfFileWriter()
    output.addPage(inputpdf.getPage(i))
    with open("document-page%s.pdf" % i, "wb") as outputStream:
        output.write(outputStream)

这是完整的错误信息:

Traceback (most recent call last):
  File "pdf_functions.py", line 9, in <module>
    output.write(outputStream)
  File "/usr/local/lib/python3.4/dist-packages/PyPDF2/pdf.py", line 482, in write
    self._sweepIndirectReferences(externalReferenceMap, self._root)
  File "/usr/local/lib/python3.4/dist-packages/PyPDF2/pdf.py", line 572, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "/usr/local/lib/python3.4/dist-packages/PyPDF2/pdf.py", line 548, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "/usr/local/lib/python3.4/dist-packages/PyPDF2/pdf.py", line 572, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "/usr/local/lib/python3.4/dist-packages/PyPDF2/pdf.py", line 548, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "/usr/local/lib/python3.4/dist-packages/PyPDF2/pdf.py", line 557, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, data[i])
  File "/usr/local/lib/python3.4/dist-packages/PyPDF2/pdf.py", line 572, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "/usr/local/lib/python3.4/dist-packages/PyPDF2/pdf.py", line 548, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "/usr/local/lib/python3.4/dist-packages/PyPDF2/pdf.py", line 575, in _sweepIndirectReferences
    if data.pdf.stream.closed:
AttributeError: 'PdfFileWriter' object has no attribute 'stream'

我也试过了,确认确实可以提取单个页面。

from PyPDF2 import PdfFileWriter, PdfFileReader
inputpdf = PdfFileReader(open("/home/ubuntu/inputs/cityshape/form5.pdf", "rb"))

#for i in range(inputpdf.numPages):
output = PdfFileWriter()
output.addPage(inputpdf.getPage(2))
with open("document-page2.pdf", "wb") as outputStream:
    output.write(outputStream)

最佳答案

同样的事情发生在我身上。

我能够通过在循环中移动以下行来解决它:

inputpdf = PdfFileReader(open("/home/ubuntu/inputs/cityshape/form5.pdf", "rb"))

我相信某些版本的 PyPDF2 有某种错误,当您调用 PdfFileWriter.write 方法时,它会混淆 PdfFileReader 实例。通过在每次写入后重新创建 PdfFileReader 实例,它绕过了这个错误。

以下代码应该可以工作(未经测试):

from PyPDF2 import PdfFileWriter, PdfFileReader

pdf_in_file = open("my_pdf.pdf",'rb')

inputpdf = PdfFileReader(pdf_in_file)
pages_no = inputpdf.numPages

for i in range(pages_no):
    inputpdf = PdfFileReader(pdf_in_file)
    output = PdfFileWriter()
    output.addPage(inputpdf.getPage(i))
    with open("document-page%s.pdf" % i, "wb") as outputStream:
        output.write(outputStream)

pdf_in_file.close()        

关于python - PyPDF2 PdfFileWriter 没有属性流,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40168027/

相关文章:

php - 使用 PHP 在新选项卡中打开 PDF

python - 当 PyPDF2 正在解析的 PDF 损坏时,我可以让 PyPDF2 优雅地失败吗?

python - 使用 pypdf 创建图章不适用于简单的 PDF 文件,但适用于其他文件

python - 同时使用 pyaudio 播放和录制声音

python - Matplotlib:颜色图可以暗示不同的默认规范化吗?

c# - 使用 iTextSharp 获取复选框的导出值

android - 如何在 Android 中阅读 PDF?

python - 解析 PDF 时忽略表格

python - 如何了解随机森林中特定样本的特征及其贡献

python - 使用 pandas 优化字符串查询。大数据