python - 获取由其中之一标记的多个数组的所有组件状态

我已经问过一个类似的问题，但得到了解答，但现在有更多详细信息:

我需要一种非常快速的方法来获取两个数组的所有重要组件状态，其中一个数组由opencv2标记，并提供两个数组的组件区域。然后，应将在两个阵列上屏蔽的所有组件的统计信息保存到字典中。我的方法有效，但是速度太慢。有什么需要避免的循环或比ndimage.öabeled_comprehension更好的方法吗？

from scipy import ndimage
import numpy as np
import cv2

def calculateMeanMaxMin(val):
    return np.array([np.mean(val),np.max(val),np.min(val)])

def getTheStatsForComponents(array1,array2):
    ret, thresholded= cv2.threshold(array2, 120, 255, cv2.THRESH_BINARY)
    thresholded= thresholded.astype(np.uint8)
    numLabels, labels, stats, centroids = cv2.connectedComponentsWithStats(thresholded, 8, cv2.CV_8UC1)
    allComponentStats=[]
    meanmaxminArray2 = ndimage.labeled_comprehension(array2, labels, np.arange(1, numLabels+1), calculateMeanMaxMin, np.ndarray, 0)
    meanmaxminArray1 = ndimage.labeled_comprehension(array1, labels, np.arange(1, numLabels+1), calculateMeanMaxMin, np.ndarray, 0)
    for position, label in enumerate(range(1, numLabels)):
        currentLabel = np.uint8(labels== label)
        contour, _ = cv2.findContours(currentLabel, cv2.RETR_LIST, cv2.CHAIN_APPROX_NONE)
        (side1,side2)=cv2.minAreaRect(contour[0])[1]
        componentStat = stats[label]
        allstats = {'position':centroids[label,:],'area':componentStat[4],'height':componentStat[3],
                              'width':componentStat[2],'meanArray1':meanmaxminArray1[position][0],'maxArray1':meanmaxminArray1[position][1],
                              'minArray1':meanmaxminArray1[position][2],'meanArray2':meanmaxminArray2[position][0],'maxArray2':meanmaxminArray2[position][1],
                              'minArray2':meanmaxminArray2[position][2]}

        if side1 >= side2 and side1 > 0:
            allstats['elongation'] = np.float32(side2 / side1)
        elif side2 > side1 and side2 > 0:
            allstats['elongation'] = np.float32(side1 / side2)
        else:
            allstats['elongation'] = np.float32(0)
        allComponentStats.append(allstats)
    return allComponentStats

编辑

这两个数组是2d数组:

array1= np.random.choice(255,(512,512)).astype(np.uint8)
array2= np.random.choice(255,(512,512)).astype(np.uint8)

EDIT2

两个数组的小例子，带有两个组件(1、2和背景0)的labelArray。用ndimage.labeled_comprhension计算最小值，最大值平均值。

from scipy import ndimage
import numpy as np

labelArray = np.array([[0,1,1,1],[2,2,1,1],[2,2,0,1]])
data = np.array([[0.1,0.2,0.99,0.2],[0.34,0.43,0.87,0.33],[0.22,0.53,0.1,0.456]])
data2 = np.array([[0.1,0.2,0.99,0.2],[0.1,0.2,0.99,0.2],[0.1,0.2,0.99,0.2]])
numLabels = 2

minimumDataForAllLabels = ndimage.labeled_comprehension(data, labelArray, np.arange(1, numLabels+1), np.min, np.ndarray, 0)
minimumData2ForallLabels = ndimage.labeled_comprehension(data2, labelArray, np.arange(1, numLabels+1), np.min, np.ndarray, 0)
print(minimumDataForAllLabels)
print(minimumData2ForallLabels)
print(bin_and_do_simple_stats(labelArray.flatten(),data.flatten()))

输出:

[0.2 0.22] ##minimum of component 1 and 2 from data
[0.2 0.1] ##minimum of component 1 and 2 from data2
[0.1  0.2  0.22] ##minimum output of bin_and_do_simple_stats from data

最佳答案

labeled_comprehension是definitely slow。

根据链接的帖子，至少简单的统计信息可以更快地完成。为简单起见，我只做一个数据数组，但是当过程返回排序索引时，它可以轻松扩展到多个数组:

import numpy as np    
from scipy import sparse
try:
    from stb_pthr import sort_to_bins as _stb_pthr
    HAVE_PYTHRAN = True
except:
    HAVE_PYTHRAN = False

# fallback if pythran not available

def sort_to_bins_sparse(idx, data, mx=-1):
    if mx==-1:
        mx = idx.max() + 1    
    aux = sparse.csr_matrix((data, idx, np.arange(len(idx)+1)), (len(idx), mx)).tocsc()
    return aux.data, aux.indices, aux.indptr

def sort_to_bins_pythran(idx, data, mx=-1):
    indices, indptr = _stb_pthr(idx, mx)
    return data[indices], indices, indptr

# pick best available

sort_to_bins = sort_to_bins_pythran if HAVE_PYTHRAN else sort_to_bins_sparse

# example data

idx = np.random.randint(0,10,(100000))
data = np.random.random(100000)

# if possible compare the two methods

if HAVE_PYTHRAN:
    dsp,isp,psp = sort_to_bins_sparse(idx,data)
    dph,iph,pph = sort_to_bins_pythran(idx,data)

    assert (dsp==dph).all()
    assert (isp==iph).all()
    assert (psp==pph).all()

# example how to do simple vectorized calculations

def simple_stats(data,iptr):
    min = np.minimum.reduceat(data,iptr[:-1])
    mean = np.add.reduceat(data,iptr[:-1]) / np.diff(iptr)
    return min, mean

def bin_and_do_simple_stats(idx,data,mx=-1):
    data,indices,indptr = sort_to_bins(idx,data,mx)
    return simple_stats(data,indptr)

print("minima: {}\n mean values: {}".format(*bin_and_do_simple_stats(idx,data)))

如果您有pythran(不是必需的，但是要快一些)，请将其编译为<stb_pthr.py>:

import numpy as np

#pythran export sort_to_bins(int[:], int)

def sort_to_bins(idx, mx):
    if mx==-1:
        mx = idx.max() + 1
    cnts = np.zeros(mx + 2, int)
    for i in range(idx.size):
        cnts[idx[i]+2] += 1
    for i in range(2, cnts.size):
        cnts[i] += cnts[i-1]
    res = np.empty_like(idx)
    for i in range(idx.size):
        res[cnts[idx[i]+1]] = i
        cnts[idx[i]+1] += 1
    return res, cnts[:-1]

关于python - 获取由其中之一标记的多个数组的所有组件状态，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57498039/

python - 获取由其中之一标记的多个数组的所有组件状态

上一篇：python-3.x - 轴对齐边界框Skimage

下一篇：opencv - 从面对页面扫描中裁剪页面