python - 尝试在 Python 中实现惰性分区时感觉很愚蠢

标签 python iterator lazy-evaluation

我正在尝试实现迭代器对象的惰性分区，当迭代器元素上的函数更改值时，该迭代器对象会生成迭代器的切片。这将模仿 Clojure 的分区行为(尽管输出的语义会有所不同，因为 Python 会真正地“消耗”元素)。我的实现在执行的操作数方面是最佳的，但在所需的内存方面却不是。我不明白为什么一个好的实现需要超过 O(1) 的内存，但我的实现占用了 O(k) 内存，其中 k 是分区的大小。我希望能够处理 k 很大的情况。有谁知道好的实现方案吗？

正确的行为应该是这样的

>>>unagi = [-1, 3, 4, 7, -2, 1, -3, -5]
>>> parts = partitionby(lambda x: x < 0,unagi)
>>> print [[y for y in x] for x in parts]
[[-1], [3, 4, 7], [-2], [1], [-3, -5]]

这是我当前的版本

from itertools import *

def partitionby(f,iterable):
    seq = iter(iterable)
    current = next(seq)
    justseen = next(seq)
    partition = iter([current])
    while True:
        if f(current) == f(justseen): 
            partition = chain(partition,iter([justseen]))
            try:
                justseen = next(seq)
            except StopIteration:
                yield partition
                break
        else:
            yield partition
            current = justseen
            partition = iter([])

最佳答案

为什么不 reuse groupby ？我认为是 O(1)。

def partitionby(f, iterable):
    return (g[1] for g in groupby(iterable, f))

groupby 的实现与您的不同之处在于，partition 是一个专门的迭代器对象，而不是 的 chain chain 的>chain ...

关于python - 尝试在 Python 中实现惰性分区时感觉很愚蠢，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/5835685/

上一篇：python - 如何为公共(public) RSA/DSA key 生成 PEM 序列化

下一篇：python - 提取 Blender 原始坐标 (ORCO)

python - 如何使用转换器插件将 shapefile 转换为 jvectormap？

python - 单词聚类列表列表

python - 注册 Matplotlib 颜色图

list - Haskell — 从无限列表中获取多个值，而无需重新开始列表

javascript - 从对象生成哈希

vector - 有效地将大向量分块为向量的向量

python - 使用一维作为循环中的迭代器剩余维度迭代 3D numpy

c# - C# 中的生成器？

python - 使用 pandas TimeSeries 在某个时间戳后选择第一个索引