python - 使用 Python pdfMiner 每页提取文本？

标签 python pdf

我已经尝试使用 pyPdf 和 pdfMiner 从 pdf 文件中提取文本。我有一些不友好的 pdf，只有 pdfMiner 能够成功提取。我正在使用代码 here提取整个文件的文本。但是，我真的很想在每页的基础上提取文本，例如 pyPdf 中的 getPage(i).extractText() 功能。有谁知道如何使用 pdfMiner 提取每页的文本？

最佳答案

for pageNumber, page in enumerate(PDFDocument.get_pages()):
    if pageNumber == 42:
        #do something with the page

有篇不错的文章here .

关于python - 使用 Python pdfMiner 每页提取文本？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/12605170/

上一篇：python - 类型错误 : init() takes at least 2 arguments (1 given) error

下一篇：python - 使用 boto 列出自动缩放组中的实例

相关文章：

pdf - 修复 PDF 编码

python - 如何构建一个通用函数来打印 Python 3.3 中先前列表的属性列表？

python - 使用lxml删除中文HTML文件中的多余空格

python - Anki 网页抓取脚本

java - 无法使用 pdfbox 打印任何内容

PDF 的 JavaScript : What is the token beteween words used by getPageNthWord()?

python - 在 Python 中使用设置文件的最佳做法是什么？

python - mysql 命令作为 python 子进程

android从 Assets 文件夹访问pdf文件

pdf - 用于索引 PDF 文件的语义标记

python - 使用 Python pdfMiner 每页提取文本？

上一篇：python - 类型错误 : __init__() takes at least 2 arguments (1 given) error

下一篇：python - 使用 boto 列出自动缩放组中的实例

上一篇：python - 类型错误 : init() takes at least 2 arguments (1 given) error