python - 在 1 和 0 的巨大列表中提取 1 密集区域的边界

我不确定如何表达我的问题。但是这里...

我有一大堆 1 和 0 [总长度 = 53820]。

列表的示例 - [0,1,1,1,1,1,1,1,1,0,0,0,1,1,0,0,0,0,0,0,1,1... .......]

可视化如下。

x轴元素的索引(从0到53820)

y 轴该索引处的值(即 1 或 0)

输入图-->

该图清楚地显示了 3 个密集区域，其中 1 的出现次数较多。我在图的顶部绘制以显示视觉上密集的区域。 (情节上丑陋的黑线)。我想知道图上密集区域(开始和结束边界)的 x 轴上的索引号。

我已经提取了 1 的 block 并将每个 block 的起始索引保存在名为“starts”的新列表中。该函数返回一个字典列表，如下所示:

{'start': 0, 'count': 15, 'end': 16}, {'start': 2138, 'count': 3, 'end': 2142}, {'start': 2142, 'count': 3, 'end': 2146}, {'start': 2461, 'count': 1, 'end': 2463}, {'start': 2479, 'count': 45, 'end': 2525}, {'start': 2540, 'count': 2, 'end': 2543}

然后在开始，设置一个阈值后，比较相邻的元素。返回密集区域的表观边界。

THR = 2000
    results = []
    cues = {'start': 0, 'stop': 0}  
    result,starts = densest(preds) # Function that returns the list of dictionaries shown above
    cuestart = False # Flag to check if looking for start or stop of dense boundary
    for i,j in zip(range(0,len(starts)), range(1,len(starts))):
        now = starts[i]
        nextf = starts[j]

        if(nextf-now > THR):
            if(cuestart == False):
                cues['start'] = nextf
                cues['stop'] = nextf
                cuestart = True

            elif(cuestart == True): # Cuestart is already set
                cues['stop'] = now
                cuestart = False
                results.append(cues)
                cues = {'start': 0, 'stop': 0}

    print('\n',results)

输出和相应的绘图如下所示。

[{'start': 2138, 'stop': 6654}, {'start': 23785, 'stop': 31553}, {'start': 38765, 'stop': 38765}]

输出图 -->

此方法无法获得图中所示的最后一个密集区域，也无法获得类似类型的其他数据。

P.S. 我也尝试过使用 seaborn 对这些数据使用“KDE”和“distplot”，但这直接给我绘图，我无法从中提取边界值。该问题的链接在这里 ( Getting dense region boundary values from output of KDE plot )

最佳答案

好的，你需要一个答案......

首先，导入(我们将使用LineCollections)

import numpy as np ; import matplotlib.pyplot as plt ;                           
from matplotlib.collections import LineCollection

接下来是常量的定义

N = 1001 ; np.random.seed(20190515)

和假数据的生成

x = np.linspace(0,1, 1001)                                                       
prob = np.where(x<0.4, 0.02, np.where(x<0.7, 0.95, 0.02))                        
y = np.where(np.random.rand(1001)<prob, 1, 0)

这里我们创建线集合，sticks是一个N×2×2数组包含垂直线的起点和终点

sticks = np.array(list(zip(zip(x, np.zeros(N)), zip(x, y))))                                  
lc = LineCollection(sticks)

最后，累积总和，这里归一化为与垂直线

cs = (y-0.5).cumsum()                                                            
csmin, csmax = min(cs), max(cs)                                                  
cs = (cs-csmin)/(csmax-csmin) # normalized to 0 ÷ 1

我们只需要绘制我们的结果

f, a = plt.subplots()                                                            
a.add_collection(lc)                                                             
a.plot(x, cs, color='red')                                                       
a.grid()                                                                         
a.autoscale()

这是剧情

这里是停止区的详细信息。

您可以平滑 cs 数据并使用 scipy.optimize 中的内容来发现极端的位置。如果你在这最后有问题步骤请再问一个问题。

关于python - 在 1 和 0 的巨大列表中提取 1 密集区域的边界，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56130596/

python - 在 1 和 0 的巨大列表中提取 1 密集区域的边界

上一篇：python - 获取满足条件的所有可能的 3x2 矩阵的数量

下一篇：python - tf 2.0 : RuntimeError: GradientTape. gradient 在非持久化磁带上只能调用一次