python - python中的实时OCR

问题

我试图用 OpenCV 捕获我的桌面并让 Tesseract OCR 查找文本并将其设置为变量，例如，如果我要玩游戏并且捕获帧超过资源量，我希望它打印并使用它。一个完美的例子是 a video by Micheal Reeves
每当他在游戏中失去健康时，它就会显示出来并将其发送到他的蓝牙气枪上射击他。到目前为止，我有这个:

# imports
from PIL import ImageGrab
from PIL import Image
import numpy as np
import pytesseract
import argparse
import cv2
import os

fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter("output.avi", fourcc, 5.0, (1366, 768))

while(True):
        x = 760
        y = 968

        ox = 50
        oy = 22

        # screen capture
        img = ImageGrab.grab(bbox=(x, y, x + ox, y + oy))
        img_np = np.array(img)
        frame = cv2.cvtColor(img_np, cv2.COLOR_BGR2RGB)
        cv2.imshow("Screen", frame)
        out.write(frame)

        if cv2.waitKey(1) == 0:
                break

out.release()
cv2.destroyAllWindows()

它实时捕获并将其显示在一个窗口中，但我不知道如何让它识别每一帧的文本并输出它。

有什么帮助吗？

最佳答案

抓取屏幕并将其传递给tesseract 相当简单。用于 OCR。
PIL(枕头)库可以在 MacOS 和 Windows 上轻松抓取帧。然而，这个特性最近才被添加到 Linux 中，所以下面的代码可以解决它不存在的问题。 (我在 Ubuntu 19.10 上，我的 Pillow 不支持它)。
本质上，用户使用屏幕区域矩形坐标启动程序。主循环不断地抓取屏幕的这个区域，将其提供给 Tesseract。如果 Tesseract 在该图像中发现任何非空白文本，则将其写入标准输出。
请注意，这不是一个适当的实时系统。没有及时性的保证，每一帧都花多少就花多少。您的机器可能会获得 60 FPS 或者可能会获得 6。这也将受到您要求它监控的矩形大小的极大影响。

#! /usr/bin/env python3

import sys
import pytesseract
from PIL import Image

# Import ImageGrab if possible, might fail on Linux
try:
    from PIL import ImageGrab
    use_grab = True
except Exception as ex:
    # Some older versions of pillow don't support ImageGrab on Linux
    # In which case we will use XLib 
    if ( sys.platform == 'linux' ):
        from Xlib import display, X   
        use_grab = False
    else:
        raise ex


def screenGrab( rect ):
    """ Given a rectangle, return a PIL Image of that part of the screen.
        Handles a Linux installation with and older Pillow by falling-back
        to using XLib """
    global use_grab
    x, y, width, height = rect

    if ( use_grab ):
        image = PIL.ImageGrab.grab( bbox=[ x, y, x+width, y+height ] )
    else:
        # ImageGrab can be missing under Linux
        dsp  = display.Display()
        root = dsp.screen().root
        raw_image = root.get_image( x, y, width, height, X.ZPixmap, 0xffffffff )
        image = Image.frombuffer( "RGB", ( width, height ), raw_image.data, "raw", "BGRX", 0, 1 )
        # DEBUG image.save( '/tmp/screen_grab.png', 'PNG' )
    return image


### Do some rudimentary command line argument handling
### So the user can speicify the area of the screen to watch
if ( __name__ == "__main__" ):
    EXE = sys.argv[0]
    del( sys.argv[0] )

    # EDIT: catch zero-args
    if ( len( sys.argv ) != 4 or sys.argv[0] in ( '--help', '-h', '-?', '/?' ) ):  # some minor help
        sys.stderr.write( EXE + ": monitors section of screen for text\n" )
        sys.stderr.write( EXE + ": Give x, y, width, height as arguments\n" )
        sys.exit( 1 )

    # TODO - add error checking
    x      = int( sys.argv[0] )
    y      = int( sys.argv[1] )
    width  = int( sys.argv[2] )
    height = int( sys.argv[3] )

    # Area of screen to monitor
    screen_rect = [ x, y, width, height ]  
    print( EXE + ": watching " + str( screen_rect ) )

    ### Loop forever, monitoring the user-specified rectangle of the screen
    while ( True ): 
        image = screenGrab( screen_rect )              # Grab the area of the screen
        text  = pytesseract.image_to_string( image )   # OCR the image

        # IF the OCR found anything, write it to stdout.
        text = text.strip()
        if ( len( text ) > 0 ):
            print( text )

这个答案是从 SO 上的各种其他答案拼凑而成的。
如果您经常将此答案用于任何事情，那么值得添加一个速率限制器以节省一些 CPU。每个循环它可能会睡半秒钟。

关于python - python中的实时OCR，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52899174/

python - python中的实时OCR

上一篇：Wagtail:如何更改默认根和/或主页

下一篇：Django 记录器 : How to log Username