python - 对生成器使用前瞻

我在 Python 中实现了一个基于生成器的扫描器，它将字符串标记为 ( token 类型， token 值)形式的元组:

for token in scan("a(b)"):
    print token

会打印

("literal", "a")
("l_paren", "(")
...

下一个任务意味着解析 token 流，为此，我需要能够在不向前移动指针的情况下从当前项向前查看一项。迭代器和生成器不会一次提供完整的项目序列，而是根据需要提供每个项目的事实，这使得与列表相比，前瞻有点棘手，因为除非调用 __next__()，否则下一个项目是未知的.

基于生成器的前瞻的简单实现是什么样的？目前我正在使用一种解决方法，这意味着从生成器中列出一个列表:

token_list = [token for token in scan(string)]

然后，可以通过以下方式轻松实现前瞻:

try:
    next_token = token_list[index + 1]
except: IndexError:
    next_token = None

当然，这很好用。但是考虑到这一点，我的第二个问题出现了:首先让 scan() 成为生成器真的有意义吗？

最佳答案

那里的答案非常好，但我最喜欢的方法是使用 itertools.tee - 给定一个迭代器，它返回两个(或更多，如果需要)可以独立推进。它根据需要在内存中缓冲(即，如果迭代器彼此之间没有非常“失步”，则不会太多)。例如:

import itertools
import collections

class IteratorWithLookahead(collections.Iterator):
  def __init__(self, it):
    self.it, self.nextit = itertools.tee(iter(it))
    self._advance()
  def _advance(self):
    self.lookahead = next(self.nextit, None)
  def __next__(self):
    self._advance()
    return next(self.it)

你可以用这个类包装任何迭代器，然后使用包装器的 .lookahead 属性来知道将来要返回的下一个项目是什么。我喜欢将所有真正的逻辑留给 itertools.tee，只提供这种薄胶水!-)

关于python - 对生成器使用前瞻，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/1517862/

python - 对生成器使用前瞻

上一篇：python - 为什么 < 比 >= 慢

下一篇：python - 突破 Google App Engine Python 锁定？