python - pytesseract.image_to_string 似乎无法从图像中提取文本

标签 python opencv python-tesseract

我正在尝试从图像中提取文本,但是,使用我在其他图像上尝试过的以下代码,它可以工作,但不能在此图像上。代码有问题吗?

尝试从以下位置提取文本的图像:Original Image 这是代码:

import cv2
import pytesseract
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

try:
    from PIL import Image
except ImportError:
    import Image

# Import image, convert,resize and noise removal
img = cv2.imread("sample01.png", cv2.IMREAD_GRAYSCALE)
print('Dimension of image: {}'.format(img.ndim))
img = cv2.resize(img, None, fx=2, fy=2)
blur = cv2.GaussianBlur(img, (5, 5), 0)

# Apply adaptiveThreshold (Mean)
th2 = cv2.adaptiveThreshold(blur, 255, cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY, 11, 2)
cv2.imwrite('resize_adaptive_threshmean.png', th2)

# Apply Tesseract to detect words
print(pytesseract.image_to_string(Image.open('resize_adaptive_threshmean.png')))   
print("=========================================================")

代码有问题吗?

最佳答案

好吧,你可以使用adaptive-thresholding

import cv2
import numpy as np
import pytesseract

img = cv2.imread("ACtBA.png")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
flt = cv2.adaptiveThreshold(gry,
                            100, cv2.ADAPTIVE_THRESH_MEAN_C,
                            cv2.THRESH_BINARY, 15, 16)
txt = pytesseract.image_to_string(flt)
print(txt)

图像将是:

enter image description here

结果:

Parking: You may park anywhere on the campus where there are no signs prohibiting par-
king. Keep in mind the carpool hours and park accordingly so you do not get blocked in the
afternoon

Under Schoo! Age Children.While we love the younger children, it can be disruptive and
inappropriate to have them on campus during school hours. There may be special times
that they may be invited or can accompany a parent volunteer, but otherwise we ask that
you adhere to our —_ policy for the benefit of the students and staff.

我用不同的参数进行了测试,所以我认为最合适的参数是:

maxValue = 100  # Display pixels greater than maxValue

blockSize=15. # size of neighbourhood area.

C=16  #  just a constant which is subtracted from the mean or weighted mean calculated.

关于python - pytesseract.image_to_string 似乎无法从图像中提取文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64993072/

相关文章:

python - PyArray_SimpleNewFromData

python - 多实例Django论坛软件

c++ - OpenCv 未定义对 `cv::的引用

python - 从图像python中识别数字

python - 如何使用opencv从图像中去除其他噪音

python - 需要帮助此时钟使pytesseract ocr

python - 如何使用python tesseract仅设置init参数?

python - 如何将单个数据帧列转换为每行的字典,并以列名作为键?

python - Pandas:以 'column' 标题作为行元素读取时间序列数据的 CSV

c++ - 高斯模糊 C++(无法显示完整图像)