python - 查找 numpy 数组中某个值的所有序列(以及最长序列)

标签 python numpy numpy-ndarray

我有一个情况,我需要在 numpy 数组中找到值为 1 的多个项目的最佳分布。假设我有以下数组,其中仅包含随机顺序的 01:

import numpy as 

# this 1d array can have up to 10000 elements

data = np.array([
0, 0, 0, 0, 0, 1, 0,
0, 1, 0, 0, 0, 1, 1,
1, 1, 0, 0, 0, 1, 1,
0, 0, 0, 0, 1, 1, 1,
0, 0, 0, 0, 1, 1, 1,
0, 0, 0, 0, 0, 0, 1,
])

num_of_ones_to_fill_gaps = 5

此外,我还有一定数量的 n 1 (num_of_ones_to_fill_gaps),它们应该以某种方式分布在数组中,即构建了尽可能长、连贯的 1 序列。使用num_of_ones_to_fill_gaps=5(可以使用五个1来用0值填充间隙),例如有3个结果,最长序列为 1,序列长度为 11

        a)                           b)                       c)
result = np.array([    |    result = np.array([    |  result = np.array([ 
0, 0, 0, 0, 0, 1, 0,   |    0, 0, 0, 0, 0, 1, 1,   |  0, 0, 0, 0, 0, 1, 1, 
                                                                     ^  ^
0, 1, 0, 0, 0, 1, 1,   |    0, 1, 0, 0, 0, 1, 1,   |  0, 1, 0, 0, 0, 1, 1, 
                                                      ^  ^  ^  ^  ^  ^  ^
1, 1, 0, 0, 0, 1, 1,   |    1, 1, 0, 0, 0, 1, 1,   |  1, 1, 0, 0, 0, 1, 1, 
                                                      ^  ^
0, 0, 0, 0, 1, 1, 1,   |    0, 0, 0, 1, 1, 1, 1,   |  0, 0, 0, 0, 1, 1, 1, 
            ^  ^  ^    |             ^  ^  ^  ^    |   
1, 1, 1, 1, 1, 1, 1,   |    1, 1, 1, 1, 1, 1, 1,   |  0, 0, 0, 0, 1, 1, 1, 
^  ^  ^  ^  ^  ^  ^    |    ^  ^  ^  ^  ^  ^  ^    |   
1, 0, 0, 0, 0, 0, 1,   |    0, 0, 0, 0, 0, 0, 1,   |  0, 0, 0, 0, 0, 0, 1, 
^                      |                           |   
])                     |    ])                     |  ]) 

我的第一个问题是 numpy 是否有可能提供一个内置的向量化方法,该方法能够计算1的最长可能序列并返回我是(多个)相同长度结果的开始和结束索引?

result = np.array([
(22, 32),
(21, 31),
(5, 15),
])

我的第二个问题是是否存在一个numpy向量化方法,它提取所有可能的1序列(带有填充的间隙) ),无论它们的长度是多少。结果可能类似于:

result = np.array([
(0, 4),  # data[0:4], data.size == 5
(1, 6),  # data[1:6], data.size == 6 because index at position 5 is a 1
(2, 7),  # data[2:7], data.size == 6 because index at position 5 is a 1
(3, 9),  # data[3:9], data.size == 7 because indices at position 5 and 8 are a 1
...
])

我尝试以一种易于理解的方式概述问题。我在文档和 stackoverflow 中进行了研究,但不知道如何开始。我发现的是迭代解决方案。任何建议和解决方案都将受到高度赞赏。再次感谢您!

最佳答案

这将是我当前的解决方案,假设我可以在空闲点(即零)中以任意组合填充这些值。

免责声明:我没有对其进行广泛测试。

from itertools import combinations

import numpy as np
from scipy.ndimage.measurements import find_objects
from scipy.ndimage.measurements import label


data = np.array(
    [0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1,]
)
m = len(data)

num_of_ones_to_fill_gaps = 4

# Find all possible combinations of indexes which we could set to 1
zero_idxs, = np.where(np.equal(data, 0))
combs = list(combinations(zero_idxs, num_of_ones_to_fill_gaps))

# Convert combinations into one-hot vectors; the len of each vector
#  is equal to the len(data)
combs_onehot = np.eye(m)[np.asarray(combs)]

# Summing on the first axis will give us masks that we can directly
#  sum to the original array. For example, if we had two 1s to insert
#  and a possible combination were (0, 1), combs_onehot would become
#  ([1, 0, 0, ...], [0, 1, 0, 0, ...]) and summing would give us the
#  mask [1, 1, 0, 0, ...]
masks = np.sum(combs_onehot, axis=1).astype(int)

# Broadcast sum of the mask to original array. If our original array
#  had len M and we found N possible combinations, this has shape (N, M)
data_filled = data + masks

# 1-D connected component labeling
str_el = np.asarray([[0,0,0], [1,1,1], [0,0,0]])
labeled, _ = label(data_filled, structure=str_el)

slices = find_objects(labeled)

longest = max(slices, key=lambda x: x[1].stop - x[1].start)
longest_row = longest[0].start

print(f'Best solution: {combs[longest_row]}')
print(f'Longest run: {longest[1].stop - longest[1].start}')

关于python - 查找 numpy 数组中某个值的所有序列(以及最长序列),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58990449/

相关文章:

python - 如何合并两个具有不同索引的数据帧?

python - 从无元素的列表中获取最大值

python - -1 在 numpy reshape 中是什么意思?

python - Numpy 数组 : Function affects original input object as well

Python - 通过不在模块级别导入来优化?

python - pyTorch LSTM 中的准确度分数

python - 为什么使用 FFT 对信号中的频率值进行四舍五入?

python - 如何沿轴指数衰减值?

python - 将结构化 numpy 数组(包含子数组)转换为 pandas 数据帧

python - numpy数组任意列之间的(内存)高效操作