python - 从领先的某些实例中清除列表

标签 python

<分区>

我想从 'a' 的前导出现中清除列表。也就是说,['a', 'a', 'b', 'b'] 应该变成 ['b', 'b'] 并且同时['b', 'a', 'a', 'b'] 应保持不变。

def remove_leading_items(l):
    if len(l) == 1 or l[0] != 'a':
        return l
    else:
        return remove_leading_items(l[1:])

是否有更 pythonic 的方法来做到这一点?

最佳答案

是的。立即,您应该使用 for 循环。递归通常不是 Pythonic 的。其次,使用内置工具:

from itertools import dropwhile

def remove_leading_items(l, item):
    return list(dropwhile (lambda x: x == item, l))

或者

return list(dropwhile(item.__eq__, l))

编辑

出于好奇,我用不同的方法对这个问题做了一些实验:

from itertools import dropwhile
from functools import partial
from operator import eq

def _eq_drop(l, e):
    return dropwhile(e.__eq__, l)

def lam_drop(l, e):
    return dropwhile(lambda x:x==e, l)

def partial_drop(l, e):
    return dropwhile(partial(eq, e), l)

首先,使用完全删除的列表:即 (1, 1, 1, ...)

In [64]: %%timeit n = 10000; t0 = (1,)*n; t1 = (1,) + (0,)*(n-1); t2 = (1,)*(n//2) + (0,)*(n//2);
    ...: list(_eq_drop(t0, 1))
    ...:
1000 loops, best of 3: 389 µs per loop

In [65]: %%timeit n = 10000; t0 = (1,)*n; t1 = (1,) + (0,)*(n-1); t2 = (1,)*(n//2) + (0,)*(n//2);
    ...: list(lam_drop(t0, 1))
    ...:
1000 loops, best of 3: 1.19 ms per loop

In [66]: %%timeit n = 10000; t0 = (1,)*n; t1 = (1,) + (0,)*(n-1); t2 = (1,)*(n//2) + (0,)*(n//2);
    ...: list(partial_drop(t0, 1))
    ...:
1000 loops, best of 3: 893 µs per loop

所以 __eq__ 在这种情况下显然是最快的。我喜欢它,但它直接使用了 dunder 方法,这有时会让人不悦。 dropwhile(partial(eq... 方法(冗长但明确)介于两者之间,而迟缓、笨拙的 lambda 方法排在最后。不足为奇。


现在,当一半被丢弃时,即 (1, 1, 1, ..., 0, 0, 0):

In [52]: %%timeit n = 10000; t0 = (1,)*n; t1 = (1,) + (0,)*(n-1); t2 = (1,)*(n//2) + (0,)*(n//2);
    ...: list(_eq_drop(t2, 1))
    ...:
1000 loops, best of 3: 245 µs per loop

In [53]: %%timeit n = 10000; t0 = (1,)*n; t1 = (1,) + (0,)*(n-1); t2 = (1,)*(n//2) + (0,)*(n//2);
    ...: list(lam_drop(t2, 1))
    ...:
1000 loops, best of 3: 652 µs per loop

In [54]: %%timeit n = 10000; t0 = (1,)*n; t1 = (1,) + (0,)*(n-1); t2 = (1,)*(n//2) + (0,)*(n//2);
    ...: list(partial_drop(t2, 1))
    ...:
1000 loops, best of 3: 487 µs per loop

差异并不明显。


至于为什么我说递归不是 Pythonic,请考虑以下几点:

In [6]: %%timeit n = 10000; t0 = (1,)*n; t1 = (1,) + (0,)*(n-1); t2 = (1,)*(n//2) + (0,)*(n//2);
   ...: remove_leading_items(t0, 1)
   ...:
1 loop, best of 3: 405 ms per loop

In [7]: %%timeit n = 10000; t0 = (1,)*n; t1 = (1,) + (0,)*(n-1); t2 = (1,)*(n//2) + (0,)*(n//2);
   ...: remove_leading_items(t1, 1)
   ...:
10000 loops, best of 3: 34.7 µs per loop

In [8]: %%timeit n = 10000; t0 = (1,)*n; t1 = (1,) + (0,)*(n-1); t2 = (1,)*(n//2) + (0,)*(n//2);
   ...: remove_leading_items(t2, 1)
   ...:
1 loop, best of 3: 280 ms per loop

除了丢弃 0(嗯,1 项)的退化情况外,它在所有情况下的表现都非常糟糕。

一种快速、最不灵活的方法

现在,如果您知道自己总是想要一个列表,请考虑一种高度迭代的非常方法:

def for_loop(l, e):
    it = iter(l)
    for x in it:
        if x != e:
            break
    else:
        return []
    return [x, *it]

它比使用内置函数执行得更好!

In [33]: %%timeit n = 10000; t0 = (1,)*n; t1 = (1,) + (0,)*(n-1); t2 = (1,)*(n//2) + (0,)*(n//2);
    ...: for_loop(t0, 1)
    ...:
1000 loops, best of 3: 270 µs per loop

In [34]: %%timeit n = 10000; t0 = (1,)*n; t1 = (1,) + (0,)*(n-1); t2 = (1,)*(n//2) + (0,)*(n//2);
    ...: for_loop(t1, 1)
    ...:
10000 loops, best of 3: 50.7 µs per loop

In [35]: %%timeit n = 10000; t0 = (1,)*n; t1 = (1,) + (0,)*(n-1); t2 = (1,)*(n//2) + (0,)*(n//2);
    ...: for_loop(t2, 1)
    ...:
10000 loops, best of 3: 160 µs per loop

速度较慢,但​​更灵活!

也许保持灵 active 的一个好的折衷方案是使用基于生成器的方法:

In [5]: def gen_drop(l, e):
   ...:     it = iter(l)
   ...:     for x in it:
   ...:         if x != e:
   ...:             break
   ...:     yield x
   ...:     yield from it
   ...:

In [6]: %%timeit n = 10000; t0 = (1,)*n; t1 = (1,) + (0,)*(n-1); t2 = (1,)*(n//2) + (0,)*(n//2);
   ...: list(gen_drop(t0, 1))
   ...:
1000 loops, best of 3: 287 µs per loop

In [7]: %%timeit n = 10000; t0 = (1,)*n; t1 = (1,) + (0,)*(n-1); t2 = (1,)*(n//2) + (0,)*(n//2);
   ...: list(gen_drop(t1, 1))
   ...:
1000 loops, best of 3: 359 µs per loop

In [8]: %%timeit n = 10000; t0 = (1,)*n; t1 = (1,) + (0,)*(n-1); t2 = (1,)*(n//2) + (0,)*(n//2);
   ...: list(gen_drop(t2, 1))
   ...:
1000 loops, best of 3: 324 µs per loop

使用双端队列

最后,deque 方法:

In [1]: from collections import deque
   ...:
   ...: def noLeadingZero(l, e):
   ...:     d = deque(l)
   ...:     for x in l:
   ...:         if e == x:
   ...:             d.popleft()
   ...:         else:
   ...:             break
   ...:     return list(d)
   ...:

In [2]: %%timeit n = 10000; t0 = (1,)*n; t1 = (1,) + (0,)*(n-1); t2 = (1,)*(n//2) + (0,)*(n//2);
   ...: noLeadingZero(t0, 1)
   ...:
1000 loops, best of 3: 873 µs per loop

In [3]: %%timeit n = 10000; t0 = (1,)*n; t1 = (1,) + (0,)*(n-1); t2 = (1,)*(n//2) + (0,)*(n//2);
   ...: noLeadingZero(t1, 1)
   ...:
10000 loops, best of 3: 121 µs per loop

In [4]: %%timeit n = 10000; t0 = (1,)*n; t1 = (1,) + (0,)*(n-1); t2 = (1,)*(n//2) + (0,)*(n//2);
   ...: noLeadingZero(t2, 1)
   ...:
1000 loops, best of 3: 502 µs per loop

关于python - 从领先的某些实例中清除列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48821856/

相关文章:

python - 将字符串拆分为元组列表

python - 使用win32 python将excel中的一列数据格式化为百分比

python - 删除numpy数组中的行

python - 设置 ADB_TRACE=adb 时如何在 cmd 中获取信息

python - 新线程阻塞主线程

python - 在 Pandas Groupby 和 Agg 中保留一列但使用其他列

Python、XML 和 MySQL - ascii v utf8 编码问题

Python:检查列表中的任何单词是否存在于文档中

python - 如何在 json 文件中基于 "keys"创建 Pandas DF 列?

python - 什么是__pycache__?