python - 在图像中应用智能阈值的方法

标签 python opencv image-processing ocr image-thresholding

我正在编写一个 OCR 应用程序(用于希伯来语脚本)。
应用程序的第一部分是阈值,
这是我的原始图像的样子:
original image
这是阈值处理后的样子:
after thresholding
正如你所看到的,它大部分都很好,但是字母上的“皇冠”或“装饰”有时会像这个词一样消失:
pnei original
那变成:
pnei threshold
问题是,在我对原始图像应用 RGB2GRAY 后,黑色的皇冠真的不够暗,因此在阈值处理过程中它们变白了,但是很容易看出它“应该”是黑色的,问题是如何我应该告诉算法检测它吗...
我当前的阈值代码使用 otzu + 局部阈值,这是代码:

def apply_threshold(img, is_cropped=False):
    '''
    this function applies a threshold on the image, 
    the first is Otsu TH on all the image, and afterwards an adaptive TH,
    based on the size of the image. 
    I apply a logical OR between all the THs, becasue my assumption is that a letter will always be black,
    while the background can sometimes be black and sometimes white -
    thus I need to apply OR to have the background white.
    '''
    if len(np.unique(img)) == 2:  # img is already binary
        # return img
        gray_img = rgb2gray(img)
        _, binary_img = cv2.threshold(gray_img, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
        return binary_img
    gray_img = rgb2gray(img)
    _, binary_img = cv2.threshold(gray_img.astype('uint8'), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
    connectivity = 8
    output_stats = cv2.connectedComponentsWithStats(binary_img.max() - binary_img, connectivity, cv2.CV_32S)
    df = pd.DataFrame(output_stats[2], columns=['left', 'top', 'width', 'height', 'area'])[1:]
    if df['area'].max() / df['area'].sum() > 0.1 and is_cropped and False:
        binary_copy = gray_img.copy()
        gray_img_max = gray_img[np.where(output_stats[1] == df['area'].argmax())]
        TH1, _ = cv2.threshold(gray_img_max.astype('uint8'), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
        # curr_img = binary_copy[np.where(output_stats[1] == df['area'].argmax())]
        binary_copy[np.where((output_stats[1] == df['area'].argmax()) & (gray_img > TH1))] = 255
        binary_copy[np.where((output_stats[1] == df['area'].argmax()) & (gray_img <= TH1))] = 0

        gray_img_not_max = gray_img[np.where(output_stats[1] != df['area'].argmax())]
        TH2, _ = cv2.threshold(gray_img_not_max.astype('uint8'), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
        binary_copy[np.where((output_stats[1] != df['area'].argmax()) & (gray_img > TH2))] = 255
        binary_copy[np.where((output_stats[1] != df['area'].argmax()) & (gray_img <= TH2))] = 0
        binary_img = binary_copy.copy()
    # N = [3, 5, 7, 9, 11, 13,27, 45]  # sizes to divide the image shape in
    # N = [20,85]
    N = [3, 5, 25]
    min_dim = min(binary_img.shape)
    for n in N:
        block_size = int(min_dim / n)
        if block_size % 2 == 0:
            block_size += 1  # block_size needs to be odd
        binary_img = binary_img | cv2.adaptiveThreshold(gray_img.astype('uint8'), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
                                                        cv2.THRESH_BINARY, block_size, 10)


    return binary_img
任何创意将不胜感激!

最佳答案

一种方法是 Python/OpenCV 中的除法归一化。
输入:
enter image description here

import cv2
import numpy as np

# load image
img = cv2.imread("hebrew_text.jpg")

# convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# blur
blur = cv2.GaussianBlur(gray, (99,99), 0)

# divide
divide = cv2.divide(gray, blur, scale=255)

# write result to disk
cv2.imwrite("hebrew_text_division.png", divide)

# display it
#cv2.imshow("thresh", thresh)
cv2.imshow("gray", gray)
cv2.imshow("divide", divide)
cv2.waitKey(0)
cv2.destroyAllWindows()
结果:
enter image description here
执行此操作后,您可能需要设置阈值,然后通过获取轮廓并丢弃面积小于最小重音标记大小的任何轮廓来清理它。
如果可能,我还建议将您的图像保存为 PNG 而不是 JPG。 JPG 具有有损压缩并引入了颜色变化。这可能是您在背景中遇到无关标记的一些问题的根源。

关于python - 在图像中应用智能阈值的方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68714927/

相关文章:

c++ - 使用 OpenCV 中的相机校准参数将 2D 点投影到 3D 空间

opencv - 立体校准和校正的最小棋盘图像数量

xcode - 在 xcode 4.5.1 上链接库 OpenCV 2.4.2

php - 如何使用 php 将数据存储在图像中?

C# - 多点触控帮助? USB网络摄像头输入?图像分析?

python - 带有 Pandas read_json 的列数据类型

python - Spring批处理运行python代码?

python - Numpy:以向量形式重写外积和对角线加法

python - 基于多种条件过滤像素

python - 如何在这里检测字符串 "_numbers"