python - 如何调用 pypdfocr 函数以在 python 脚本中使用它们？

最近我下载了pypdfocr ，但是，在文档中没有关于如何将 pypdfocr 调用为库的示例，有人可以帮助我调用它只是为了转换单个文件吗？。我刚找到一个终端命令:

$ pypdfocr filename.pdf

最佳答案

如果您正在寻找源代码，它通常在您的 python 安装目录 site-package 下。更重要的是，如果您使用的是 IDE(即 Pycharm)，它会帮助您找到目录和文件。这对于查找类以及向您展示如何实例化它也非常有用，例如: https://github.com/virantha/pypdfocr/blob/master/pypdfocr/pypdfocr.py 这个文件有一个 pypdfocr 类类型，你可以重复使用，并且可能做命令行会做的事情。

在那个类中，开发人员提出了很多要解析的参数:

def get_options(self, argv):
    """
        Parse the command-line options and set the following object properties:
        :param argv: usually just sys.argv[1:]
        :returns: Nothing
        :ivar debug: Enable logging debug statements
        :ivar verbose: Enable verbose logging
        :ivar enable_filing: Whether to enable post-OCR filing of PDFs
        :ivar pdf_filename: Filename for single conversion mode
        :ivar watch_dir: Directory to watch for files to convert
        :ivar config: Dict of the config file
        :ivar watch: Whether folder watching mode is turned on
        :ivar enable_evernote: Enable filing to evernote
    """
    p = argparse.ArgumentParser(description = "Convert scanned PDFs into their OCR equivalent.  Depends on GhostScript and Tesseract-OCR being installed.",
            epilog = "PyPDFOCR version %s (Copyright 2013 Virantha Ekanayake)" % __version__,
            )

    p.add_argument('-d', '--debug', action='store_true',
        default=False, dest='debug', help='Turn on debugging')

    p.add_argument('-v', '--verbose', action='store_true',
        default=False, dest='verbose', help='Turn on verbose mode')

    p.add_argument('-m', '--mail', action='store_true',
        default=False, dest='mail', help='Send email after conversion')

    p.add_argument('-l', '--lang',
        default='eng', dest='lang', help='Language(default eng)')


    p.add_argument('--preprocess', action='store_true',
            default=False, dest='preprocess', help='Enable preprocessing.  Not really useful now with improved Tesseract 3.04+')

    p.add_argument('--skip-preprocess', action='store_true',
            default=False, dest='skip_preprocess', help='DEPRECATED: always skips now.')

    #---------
    # Single or watch mode
    #--------
    single_or_watch_group = p.add_mutually_exclusive_group(required=True)
    # Positional argument for single file conversion
    single_or_watch_group.add_argument("pdf_filename", nargs="?", help="Scanned pdf file to OCR")
    # Watch directory for watch mode
    single_or_watch_group.add_argument('-w', '--watch', 
         dest='watch_dir', help='Watch given directory and run ocr automatically until terminated')

    #-----------
    # Filing options
    #----------
    filing_group = p.add_argument_group(title="Filing optinos")
    filing_group.add_argument('-f', '--file', action='store_true',
        default=False, dest='enable_filing', help='Enable filing of converted PDFs')
    #filing_group.add_argument('-c', '--config', type = argparse.FileType('r'),
    filing_group.add_argument('-c', '--config', type = lambda x: open_file_with_timeout(p,x),
         dest='configfile', help='Configuration file for defaults and PDF filing')
    filing_group.add_argument('-e', '--evernote', action='store_true',
        default=False, dest='enable_evernote', help='Enable filing to Evernote')
    filing_group.add_argument('-n', action='store_true',
        default=False, dest='match_using_filename', help='Use filename to match if contents did not match anything, before filing to default folder')


    # Add flow option to single mode extract_images,preprocess,ocr,write

    args = p.parse_args(argv)

您可以使用任何这些参数传递给它的解析器，像这样:

import pypdfocr

obj = pypdfocr.pypdfocr.pypdfocr()
obj.get_options([]) # this makes it takes default, but you could add CLI option to it.  Other option might be [-v] or [-d,-v]

我希望这能帮助您同时理解 :)

关于python - 如何调用 pypdfocr 函数以在 python 脚本中使用它们？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39988381/

python - 如何调用 pypdfocr 函数以在 python 脚本中使用它们？

上一篇：python - 对 Pandas 数据帧行进行矩阵运算

下一篇：python - 如何使用 Raspberry Pi 获得更精确的时间测量？