python - 从图像中提取文本

标签 python image-processing ocr tesseract python-tesseract

我正在研究从图像中提取文本。

最初图像是用白色文本着色的，在进一步处理图像时，文本显示为黑色而其他像素显示为白色(有一些噪声)，这是一个示例:

现在，当我在上面使用 pytesseract (tesseract) 尝试 OCR 时，我仍然没有收到任何文本。

是否有任何解决方案可以从彩色图像中提取文本？

最佳答案

from PIL import Image
import pytesseract
import argparse
import cv2

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True, help="Path to the image")
args = vars(ap.parse_args())

# load the image and convert it to grayscale
image = cv2.imread(args["image"])
cv2.imshow("Original", image)

# Apply an "average" blur to the image

blurred = cv2.blur(image, (3,3))
cv2.imshow("Blurred_image", blurred)
img = Image.fromarray(blurred)
text = pytesseract.image_to_string(img, lang='eng')
print (text)
cv2.waitKey(0)

结果我得到 = "Stay: in an Overwoter Bungalow $3»"

使用 Contour 并从中提取不必要的 Blob 怎么样？可能有用

关于python - 从图像中提取文本，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46260970/

上一篇：python - 从python3中的数组中删除元素

下一篇：python - tensorflow 中 GradientDescentOptimizer 和 AdamOptimizer 的区别？

相关文章：

python - [OpenCV Videocapture]为什么不继续阅读？

c++ - 关于 Ghostscript 的信息

python - Selenium Webdriver (Python) - 单击 div 元素(复选框)

opencv - 如何通过OpenCV用序列图像制作视频

python - 同一轴上的多个散点图

java - tess4j OCRtest JBoss 错误 java.lang.NoSuchMethodError

Python/OpenCV - 基于机器学习的 OCR(图像到文本)

java - 使用 Tesseract hOCR 提取文本属性

python - 使用逻辑 pandas 进行多重索引和掩码

python - BeautifulSoup - 如何遍历 "tr"标签？