python - 为什么需要将 map 类型转换为列表以将其分配给 Pandas 系列？

我刚刚开始学习 pandas 的基础知识，有一件事让我思考。

import pandas as pd
data = pd.DataFrame({'Column1': ['A', 'B', 'C']})
data['Column2'] = map(str.lower, data['Column1'])
print(data)

这个程序的输出是:

   Column1                             Column2
 0       A  <map object at 0x00000205D80BCF98>
 1       B  <map object at 0x00000205D80BCF98>
 2       C  <map object at 0x00000205D80BCF98>

获得所需输出的一种可能解决方案是将 map 对象类型转换为列表。

import pandas as pd
data = pd.DataFrame({'Column1': ['A', 'B', 'C']})
data['Column2'] = list(map(str.lower, data['Column1']))
print(data)

输出:

   Column1 Column2
 0       A       a
 1       B       b
 2       C       c

但是，如果我使用 range()，它在 Python 3 中也返回自己的类型，则无需将对象类型转换为列表。

import pandas as pd
data = pd.DataFrame({'Column1': ['A', 'B', 'C']})
data['Column2'] = range(3)
print(data)

输出:

   Column1  Column2
 0       A        0
 1       B        1
 2       C        2

为什么范围对象不需要类型转换而 map 对象需要类型转换？

最佳答案

TL;DR: range 有 __getitem__ 和 __len__，而 map没有。

详情

我假设创建新数据框列的语法在某种程度上是 Pandas.DataFrame.insert 的语法糖，它以 value a

作为参数

scalar, Series, or array-like

鉴于此，问题似乎简化为“为什么 pandas 将列表和范围视为类数组，而不是 map ？”

参见:numpy: formal definition of "array_like" objects? .

如果您尝试创建超出范围的数组，它可以正常工作，因为范围足够接近类似数组，但您不能使用映射来这样做。

>>> import numpy as np
>>> foo = np.array(range(10))
>>> bar = np.array(map(lambda x: x + 1, range(10))
>>> foo
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> bar
array(<map object at 0x7f7e553219e8>, dtype=object)

map 不是“类数组”，而 range 是。

进一步研究 PyArray_GetArrayParamsFromObject ，在链接的答案中提到，函数调用 PySequence_Check 的结尾。该代码是 python 代码，在 Stack Overflow 上有很好的讨论:What is Python's sequence protocol? .

早些时候，在 same file ，它说:

   /*
     * PySequence_Check detects whether an old type object is a
     * sequence by the presence of the __getitem__ attribute, and
     * for new type objects that aren't dictionaries by the
     * presence of the __len__ attribute as well. In either case it
     * is possible to have an object that tests as a sequence but
     * doesn't behave as a sequence and consequently, the
     * PySequence_GetItem call can fail. When that happens and the
     * object looks like a dictionary, we truncate the dimensions
     * and set the object creation flag, otherwise we pass the
     * error back up the call chain.
     */

这似乎是“类数组”的主要部分 - 任何具有 getitem 和 len 的项都是类数组。 range 两者都有，而 map 两者都没有。

自己试试吧!

__getitem__ 和 __len__ 是生成序列所必需且充分的，因此可以让列按您的意愿显示，而不是作为单个对象显示。

试试这个:

class Column(object):
    def __len__(self):
        return 5
    def __getitem__(self, index):
        if 0 <= index < 5:
            return index+5
        else:
            raise IndexError

col = Column()
a_col = np.array(col)

如果您没有__getitem__() 或__len()__，numpy 将为您创建一个数组，但它会包含其中的对象，它不会为您遍历。
如果您同时拥有这两个功能，它会以您想要的方式显示。

(感谢 user2357112 纠正我。在一个稍微简单的例子中，我认为 __iter__ 是必需的。它不是。__getitem__ 函数确实需要确保索引不过在范围内。)

关于python - 为什么需要将 map 类型转换为列表以将其分配给 Pandas 系列？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46732436/

python - 为什么需要将 map 类型转换为列表以将其分配给 Pandas 系列？

详情

自己试试吧!

上一篇：python - Python 中的不可变对象(immutable对象)可能具有弱引用

下一篇：python - "The set of methods, however, is fixed when the class is first defined"是真的吗？