python - 提取列表中找到的子列表的索引

标签 python arrays python-3.x list numpy

给定 list_a 和 list_b。我想通过一个函数运行 list_b ,该函数给出 list_b 的所有可能的子列表(这部分代码有效)。然后我想获取 list_b 的每个子列表,并查看该子列表是否也是 list_a 的子列表。如果是,我应该获得所有索引的列表,或者该子列表出现在 list_a 中的拼接。

我能够使代码适用于长度为一的子列表,但无法使其适用于更长的列表。

这是我当前解决此问题的代码:

import numpy as np

a = [0,1,2,3,0,2,3]
b = [0,2,3]

sublists = []
def all_sublists(my_list):  
    """ make a list containg every sublist of a my_list"""
    for i in range(len(my_list)):
        n = i+1
        while n <= len(my_list):
            sublist = my_list[i:n]
            sublists.append(sublist)
            n += 1

def sublists_splice(sublist, my_list):
    """if sublist is in my_list print sublist and the corresponding indexes"""
    values = np.array(my_list)
    print(str(sublist) + " found at " + str(np.where(values == sublist)[0]))

all_sublists(b)
for sublist in sublists:
    sublists_splice(sublist, a)

这是代码的输出:

[0] found at [0 4]
[0, 2] found at []
[0, 2, 3] found at []
[2] found at [2 5]
[2, 3] found at []
[3] found at [3 6]
/home/nicholas/Desktop/sublists.py:27: DeprecationWarning: elementwise == comparison failed; this will raise an error in the future.

这是我想要得到的:

[0] found at [0 4]
[0, 2] found at [4:6]
[0, 2, 3] found at [4:7]
[2] found at [2 5]
[2, 3] found at [2:4 5:7]
[3] found at [3 6]

我假设有一种Python式的方法来解决这个问题。虽然我尝试了一些代码,但它们都很长并且不起作用......

最后一点。我确实需要它们成为子列表而不是子集,因为顺序很重要。

我很感激任何帮助。谢谢。

最佳答案

使用 Find boolean mask by pattern 中的工具

def rolling_window(a, window):  #https://stackoverflow.com/q/7100242/2901002
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    c = np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
    return c

def vview(a):  #based on @jaime's answer: https://stackoverflow.com/a/16973510/4427777
    return np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))

def sublist_loc(a, b):
    a, b = np.array(a), np.array(b)
    n = min(len(b), len(a))   
    sublists = [rolling_window(b, i) for i in range(1, n + 1)]
    target_lists = [rolling_window(a, i) for i in range(1, n + 1)]
    locs = [[np.flatnonzero(vview(target_lists[i]) == s) for s in vview(subl)] \
        for i, subl in enumerate(sublists)]
    for i in range (n):
        for j, k in enumerate(sublists[i]):
            print(str(k) + " found starting at index " + str(locs[i][j]))
    return sublists, target_lists, locs


_ = sublist_loc(a, b)
[0] found starting at index [0 4]
[2] found starting at index [2 5]
[3] found starting at index [3 6]
[0 2] found starting at index [4]
[2 3] found starting at index [2 5]
[0 2 3] found starting at index [4]

还有一个额外的好处,所有 rolling_windowvview 调用都只是针对原始数组的 View ,因此存储组合不会占用大量内存。

关于python - 提取列表中找到的子列表的索引,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49293393/

相关文章:

python - 为什么 scipy 的稀疏 csr_matrix 向量点积比 numpy 的密集数组慢?

python - 使用 curve_fit 拟合数据时协方差矩阵的方差太大

python - 两个矩阵之间的余弦相似度计算

java - 普通类中的通用静态方法;使用基本类型数组

python - 动态数组名

Javascript - 在函数中返回数组长度

python - 如何通过 Cloudflare 使用 Python 或 DDNS 检查公共(public) IP

python - Python 2.7 的 Jupyter 安装失败

python-3.x - 摆脱 OpenCV-Python 中的线条

python - 在 Mac OS 10.9 上使用 python3 interpeter 制作 virtualenv 时出错