我有一个情况,我需要在 numpy 数组中找到值为 1
的多个项目的最佳分布。假设我有以下数组,其中仅包含随机顺序的 0
和 1
:
import numpy as
# this 1d array can have up to 10000 elements
data = np.array([
0, 0, 0, 0, 0, 1, 0,
0, 1, 0, 0, 0, 1, 1,
1, 1, 0, 0, 0, 1, 1,
0, 0, 0, 0, 1, 1, 1,
0, 0, 0, 0, 1, 1, 1,
0, 0, 0, 0, 0, 0, 1,
])
num_of_ones_to_fill_gaps = 5
此外,我还有一定数量的 n
1
(num_of_ones_to_fill_gaps
),它们应该以某种方式分布在数组中,即构建了尽可能长、连贯的 1
序列。使用num_of_ones_to_fill_gaps=5
(可以使用五个1
来用0
值填充间隙),例如有3个结果,最长序列为 1
,序列长度为 11
。
a) b) c)
result = np.array([ | result = np.array([ | result = np.array([
0, 0, 0, 0, 0, 1, 0, | 0, 0, 0, 0, 0, 1, 1, | 0, 0, 0, 0, 0, 1, 1,
^ ^
0, 1, 0, 0, 0, 1, 1, | 0, 1, 0, 0, 0, 1, 1, | 0, 1, 0, 0, 0, 1, 1,
^ ^ ^ ^ ^ ^ ^
1, 1, 0, 0, 0, 1, 1, | 1, 1, 0, 0, 0, 1, 1, | 1, 1, 0, 0, 0, 1, 1,
^ ^
0, 0, 0, 0, 1, 1, 1, | 0, 0, 0, 1, 1, 1, 1, | 0, 0, 0, 0, 1, 1, 1,
^ ^ ^ | ^ ^ ^ ^ |
1, 1, 1, 1, 1, 1, 1, | 1, 1, 1, 1, 1, 1, 1, | 0, 0, 0, 0, 1, 1, 1,
^ ^ ^ ^ ^ ^ ^ | ^ ^ ^ ^ ^ ^ ^ |
1, 0, 0, 0, 0, 0, 1, | 0, 0, 0, 0, 0, 0, 1, | 0, 0, 0, 0, 0, 0, 1,
^ | |
]) | ]) | ])
我的第一个问题是 numpy 是否有可能提供一个内置的向量化方法,该方法能够计算1
的最长可能序列并返回我是(多个)相同长度结果的开始和结束索引?
result = np.array([
(22, 32),
(21, 31),
(5, 15),
])
我的第二个问题是是否存在一个numpy向量化方法,它提取所有可能的1
序列(带有填充的间隙) ),无论它们的长度是多少。结果可能类似于:
result = np.array([
(0, 4), # data[0:4], data.size == 5
(1, 6), # data[1:6], data.size == 6 because index at position 5 is a 1
(2, 7), # data[2:7], data.size == 6 because index at position 5 is a 1
(3, 9), # data[3:9], data.size == 7 because indices at position 5 and 8 are a 1
...
])
我尝试以一种易于理解的方式概述问题。我在文档和 stackoverflow 中进行了研究,但不知道如何开始。我发现的是迭代解决方案。任何建议和解决方案都将受到高度赞赏。再次感谢您!
最佳答案
这将是我当前的解决方案,假设我可以在空闲点(即零)中以任意组合填充这些值。
免责声明:我没有对其进行广泛测试。
from itertools import combinations
import numpy as np
from scipy.ndimage.measurements import find_objects
from scipy.ndimage.measurements import label
data = np.array(
[0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1,]
)
m = len(data)
num_of_ones_to_fill_gaps = 4
# Find all possible combinations of indexes which we could set to 1
zero_idxs, = np.where(np.equal(data, 0))
combs = list(combinations(zero_idxs, num_of_ones_to_fill_gaps))
# Convert combinations into one-hot vectors; the len of each vector
# is equal to the len(data)
combs_onehot = np.eye(m)[np.asarray(combs)]
# Summing on the first axis will give us masks that we can directly
# sum to the original array. For example, if we had two 1s to insert
# and a possible combination were (0, 1), combs_onehot would become
# ([1, 0, 0, ...], [0, 1, 0, 0, ...]) and summing would give us the
# mask [1, 1, 0, 0, ...]
masks = np.sum(combs_onehot, axis=1).astype(int)
# Broadcast sum of the mask to original array. If our original array
# had len M and we found N possible combinations, this has shape (N, M)
data_filled = data + masks
# 1-D connected component labeling
str_el = np.asarray([[0,0,0], [1,1,1], [0,0,0]])
labeled, _ = label(data_filled, structure=str_el)
slices = find_objects(labeled)
longest = max(slices, key=lambda x: x[1].stop - x[1].start)
longest_row = longest[0].start
print(f'Best solution: {combs[longest_row]}')
print(f'Longest run: {longest[1].stop - longest[1].start}')
关于python - 查找 numpy 数组中某个值的所有序列(以及最长序列),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58990449/