python - 删除图像边缘的稀疏文本

我有一张扫描图像，其中有一些边缘(图像的左侧或右侧)的文本(两个 B)，我想将其删除。

以下是我尝试过的代码:

import cv2
import numpy as np
# Load the image

img = cv2.imread("1.jpg", cv2.IMREAD_GRAYSCALE)

# Apply thresholding to obtain a binary image
_, thresh = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)

# Use connectedComponentsWithStats to get the objects' properties
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(thresh, connectivity=8)

# Remove small objects
for i in range(1, num_labels):
    x = stats[i][0]
    w = stats[i][2]
    if x < 50 and w < 50 and labels[stats[i][1], stats[i][2]] != 0:
        labels[labels == i] = 0

# Convert the labels image to 8-bit unsigned integer
labels = labels.astype('uint8')

# Apply connected component analysis again to update the labels
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(labels, connectivity=8)
cv2.imwrite("output_image.png", labels*255/num_labels)

但这创建了一个黑色背景图像，边缘文本也没有被删除。

此外，这是右侧边缘带有稀疏文本的图像:右侧的语法信息应全部删除，因为 B 位于第一个图像上。

最佳答案

我们可以用白色填充 img 中的左侧连通分量，而不是删除左侧标签并再次应用连通分量分析。

将labels[labels == i] = 0替换为:

img[labels == i] = 255

写入 img 作为输出:cv2.imwrite("output_image.png", img)。

语句中有问题:labels[stats[i][1], stats[i][2]] != 0。
[i][1] 相当于 [i][cv2.CC_STAT_TOP](没问题)。
stats[i][2] 相当于 [i][cv2.CC_STAT_WIDTH] (这是一个问题)。
将 labels[stats[i][1], stats[i][2]] != 0 替换为:

labels[stats[i][cv2.CC_STAT_TOP], stats[i][cv2.CC_STAT_LEFT]] != 0

注意:
上述条件是多余的，可以删除，因为所有label(从i = 1开始)都大于零。

为了改善结果，我们可能会稍微扩大 thresh - 需要删除 B 字母周围的灰色“线”。

代码示例:

import cv2
import numpy as np
   
img = cv2.imread("1.jpg", cv2.IMREAD_GRAYSCALE)  # Load the image

# Apply thresholding to obtain a binary image
_, thresh = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)

thresh = cv2.dilate(thresh, np.ones((3, 3), np.uint8))  # Dilate thresh because the B letters has a gray "halo".

# Use connectedComponentsWithStats to get the objects' properties
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(thresh, connectivity=8)

# Remove small objects
for i in range(1, num_labels):
    x = stats[i][cv2.CC_STAT_LEFT]
    w = stats[i][cv2.CC_STAT_WIDTH]

    #if x < 50 and w < 50 and labels[stats[i][1], stats[i][2]] != 0:
    if x < 50 and w < 50:  # and labels[stats[i][cv2.CC_STAT_TOP], stats[i][cv2.CC_STAT_LEFT]] != 0:
        # labels[labels == i] = 0
        img[labels == i] = 255  # Fill img with white color were labels == i

# Convert the labels image to 8-bit unsigned integer
#labels = labels.astype('uint8')

# Apply connected component analysis again to update the labels
#num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(labels, connectivity=8)
#cv2.imwrite("output_image.png", labels*255/num_labels)
cv2.imwrite("output_image.png", img)

输出:

注释:

原始图像有一个从上到下的黑色柱。
该黑柱也被删除。
B 字母有一点剩余 - 我们可能需要用更大的内核来扩大thresh。
在发布的示例图像中，我们可以简单地用白色填充图像的整个左侧(无需搜索连接的组件):
img[:, 0:50] = 255

更新:查找边距的大小。

有多种方法可以找到边距的大小。
这些方法是启发式的，并不能保证在所有情况下都有效。

一种方法是基于文本边距“较薄”的假设。
我们可以对所有列进行求和，并根据求和来决定哪部分是margin。

示例:

import cv2
import numpy as np
from matplotlib import pyplot as plt

img = cv2.imread("2.jpg", cv2.IMREAD_GRAYSCALE)

# Downscale image in the horizontal axis - each column of resized_img is 1% of the total width
resized_img = cv2.resize(img, (100, img.shape[0]), cv2.INTER_AREA)
_, thresh = cv2.threshold(resized_img, 0, 255, cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
sum_of_rows = np.sum(thresh==255, axis=0)  # Sum the rows of thresh (use thresh==255 for converting 255 to 1).

plt.figure()
plt.plot(sum_of_rows)
plt.show()

剧情:

我们可能会寻找差距(总和为零)。
我们可能会寻找总和低于某个阈值的连续部分(例如，该阈值可能相对于总和的中位数)。

我相信您可以找到第三个启发式失败的例子。
找到一个强大的解决方案可能具有挑战性......

关于python - 删除图像边缘的稀疏文本，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/76288642/

python - 删除图像边缘的稀疏文本

上一篇：tailwind-css - 当我使用 CDN 时，如何在 tailwindcss 中使用 apply 指令？

下一篇：python - 有没有办法从我从 llama-index 获得的响应中流式传输 Fastapi 中的输出