python - 在 python3 中 : strange behaviour of list(iterables)

我有一个关于 python 中迭代的行为的具体问题。我的可迭代对象是 pytorch 中自定义构建的数据集类:

import torch
from torch.utils.data import Dataset
class datasetTest(Dataset):
    def __init__(self, X):
        self.X = X

    def __len__(self):
        return len(self.X)

    def __getitem__(self, x):
        print('***********')
        print('getitem x = ', x)
        print('###########')
        y = self.X[x]
        print('getitem y = ', y)
        return y

当我初始化该 datasetTest 类的特定实例时，现在会出现奇怪的行为。根据我作为参数 X 传递的数据结构，当我调用 list(datasetTestInstance) 时它的行为会有所不同。特别是，当传递 torch.tensor 作为参数时没有问题，但是当传递 dict 作为参数时它将抛出 KeyError 。原因是list(iterable)不只是调用i=0, ..., len(iterable)-1，而是调用i=0, ..., len(iterable)。也就是说，它将迭代直到(包括)索引等于可迭代的长度。显然，这个索引没有在任何 python 数据结构中定义，因为最后一个元素始终具有索引 len(datastruct)-1 而不是 len(datastruct)。如果 X 是 torch.tensor 或列表，则不会出现错误，即使我认为应该是错误。即使对于索引为 len(datasetTestinstance) 的(不存在的)元素，它仍然会调用 getitem，但它不会计算 y=self.X[len(datasetTestInstance]。有谁知道 pytorch 是否在内部以某种方式优雅地处理这个问题？

当将字典作为数据传递时，当 x=len(datasetTestInstance) 时，它会在最后一次迭代中抛出错误。我猜这实际上是预期的行为。但为什么这只发生在字典上而不发生在列表或 torch.tensor 上？

if __name__ == "__main__":
    a = datasetTest(torch.randn(5,2))
    print(len(a))
    print('++++++++++++')
    for i in range(len(a)):
        print(i)
        print(a[i])
    print('++++++++++++')
    print(list(a))

    print('++++++++++++')
    b = datasetTest({0: 12, 1:35, 2:99, 3:27, 4:33})
    print(len(b))
    print('++++++++++++')
    for i in range(len(b)):
        print(i)
        print(b[i])
    print('++++++++++++')
    print(list(b))

如果您想更好地理解我所观察到的内容，您可以尝试该代码片段。

我的问题是:

1.) 为什么 list(iterable) 会迭代直到(包括)len(iterable)？ for 循环不会这样做。

2.) 如果是 torch.tensor 或作为数据 X 传递的列表:为什么即使在调用索引 len(datasetTestInstance) 的 getitem 方法时也不会抛出错误，该索引实际上应该超出范围，因为它没有定义为张量/列表中的索引？或者，换句话说，当到达索引 len(datasetTestInstance) 然后进入 getitem 方法时，到底会发生什么？显然，它不再调用“y = self.X[x]”(否则会出现 IndexError)，但它确实进入了 getitem 方法，我可以看到它从 getitem 方法中打印索引 x 。那么该方法中会发生什么呢？为什么它的行为会根据是否有 torch.tensor/list 还是字典而有所不同？

最佳答案

这并不是一个真正的 pytorch 特定问题，而是一个通用的 python 问题。

您正在使用 list(iterable) 构建列表其中iterable类是实现 sequence semantics 的类.

看看 __getitem__ 的预期行为对于序列类型(最相关的部分以粗体显示)

object.__getitem__(self, key)

Called to implement evaluation of self[key]. For sequence types, the accepted keys should be integers and slice objects. Note that the special interpretation of negative indexes (if the class wishes to emulate a sequence type) is up to the __getitem__() method. If key is of an inappropriate type, TypeError may be raised; if of a value outside the set of indexes for the sequence (after any special interpretation of negative values), IndexError should be raised. For mapping types, if key is missing (not in the container), KeyError should be raised.

Note: for loops expect that an IndexError will be raised for illegal indexes to allow proper detection of the end of the sequence.

这里的问题是，对于序列类型，当使用无效索引调用 __getitem__ 时，python 会出现 IndexError 。看来 list 构造函数依赖于这种行为。在您的示例中，当 X 是一个字典时，尝试访问无效 key 会导致 __getitem__ 引发 KeyError 而不是预期的，所以不是' t 被捕获并导致列表构建失败。

根据这些信息，您可以执行以下操作

class datasetTest:
    def __init__(self):
        self.X = {0: 12, 1:35, 2:99, 3:27, 4:33}

    def __len__(self):
        return len(self.X)

    def __getitem__(self, index):
        if index < 0 or index >= len(self):
            raise IndexError
        return self.X[index]

d = datasetTest()
print(list(d))

我不建议在实践中这样做，因为它依赖于仅包含整数键 0、1 的字典 X。 .., len(X)-1 这意味着在大多数情况下它最终的行为就像一个列表，所以你最好只使用一个列表。

关于python - 在 python3 中 : strange behaviour of list(iterables)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59091544/

python - 在 python3 中 : strange behaviour of list(iterables)

上一篇：python - 使用 python 将图像传递到 Zapier 中的输出

下一篇：python - AirFlow DAG 在 DST 后运行两次