python - 使用 Python 和 PyPDF2 合并 PDF 文件会引发 TypeError

我正在使用 Python 3.6.5将 PDF 合并在一起，但遇到了问题。下面的代码抛出一个 'TypeError: 'NumberObject' object is not subscriptable'错误。我究竟做错了什么？当我用 merger.append 注释掉这一行时，它会正确打印出文件路径。

import webbrowser
import os
from PyPDF2 import PdfFileMerger, PdfFileReader

path = 'C:/test/pdfs'
merger = PdfFileMerger()
for pdf in os.listdir(path):
      merger.append(PdfFileReader(open(os.path.join(path,pdf), 'rb')))
      print(os.path.join(path,pdf))
merger.write(path+'/merged.pdf')
merger.close()
webbrowser.open_new(path+'/merged.pdf')

File "C:\test\pdftest.py", line 9, in merger.append(PdfFileReader(open(os.path.join(path,pdf), 'rb'))) File "C:\python\lib\site-packages\pypdf2-1.26.0-py3.6.egg\PyPDF2\pdf.py", line 1084, in init self.read(stream) File "C:\python\lib\site-packages\pypdf2-1.26.0-py3.6.egg\PyPDF2\pdf.py", line 1805, in read assert xrefstream["/Type"] == "/XRef" TypeError: 'NumberObject' object is not subscriptable

当我更改 merge.append 以采用文件路径时，我得到:

File "C:\test\pdftest.py", line 9, in merger.append(os.path.join(path,pdf)) File "C:\python\lib\site-packages\pypdf2-1.26.0-py3.6.egg\PyPDF2\merger.py", line 203, in append self.merge(len(self.pages), fileobj, bookmark, pages, import_bookmarks) File "C:\python\lib\site-packages\pypdf2-1.26.0-py3.6.egg\PyPDF2\merger.py", line 133, in merge pdfr = PdfFileReader(fileobj, strict=self.strict) File "C:\python\lib\site-packages\pypdf2-1.26.0-py3.6.egg\PyPDF2\pdf.py", line 1084, in init self.read(stream) File "C:\python\lib\site-packages\pypdf2-1.26.0-py3.6.egg\PyPDF2\pdf.py", line 1805, in read assert xrefstream["/Type"] == "/XRef" TypeError: 'NumberObject' object is not subscriptable

更新:看起来文件夹中的其中一个 PDF 是导致此问题的原因。该 PDF 唯一不同的是它使用 Type 1 字体，而其他 PDF 使用 TrueType 字体。有没有人知道解决方法或解决这个问题？

最佳答案

这似乎是由无法识别或错误的 PDF 格式引起的。我不是 PDF 专家，但 PyPDF2 似乎在提示外部参照表中的记录。我发现解决这个问题的最简单方法是重新格式化 PDF。
我所做的是把merger.append(PDFFileReader(file))在 try如果我找到 'NumberObject' object is not subscriptable异常中的消息我通过子进程在 headless 模式下使用 LibreOffice“转换”PDF:

command = [r'"C:\Program Files\LibreOffice\program\soffice.bin"',
           '--convert-to', 'pdf', '--outdir', f'"{dest_file_path}"', f'"{file_name}"']
pdf_convert = subprocess.Popen(' '.join(command))

关于使用 LibreOffice 和子进程的说明:无论出于何种原因，我发现作为列表传递会导致 Windows 中的访问被拒绝错误，这就是我执行 join 的原因。反而。

关于python - 使用 Python 和 PyPDF2 合并 PDF 文件会引发 TypeError，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49701323/

python - 使用 Python 和 PyPDF2 合并 PDF 文件会引发 TypeError

上一篇：travis-ci - 我可以在 travis-ci.org 上查看给定提交的先前构建日志吗？

下一篇：flatbuffers - 将 json 模式转换为 flatbuffer 模式