python - 为什么 pandas series.map 方法适用于列连接？

来自几个other posts ，连接数据框中的列的一种简单方法是使用 map 命令，如下例所示。 map 函数返回一个系列，那么为什么不能只使用常规系列而不是 map？

import pandas as pd
df = pd.DataFrame({'a':[1,2,3],'b':[4,5,6]},index=['m','n','o'])
df['x'] = df.a.map(str) + "_x"

    a   b   x

m   1   4   1_x
n   2   5   2_x
o   3   6   3_x

即使我专门创建一个系列，这也有效。

df['y'] = pd.Series(df.a.map(str)) + "_y"

    a   b   x    y
m   1   4   1_x  1_y
n   2   5   2_x  2_y
o   3   6   3_x  3_y

这不起作用，它给出了一个 TypeEror

df['z'] = df['a'] + "_z"
TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'str'

这也行不通:

df['z'] = pd.Series(df['a']) + "_z"
TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'str'

我检查了 map 是否在后台返回了不同类型的对象，但它似乎没有:

type(pd.Series(df.a.map(str)))
pandas.core.series.Series

type(pd.Series(df['a']))
pandas.core.series.Series

我对 map 正在做什么使这项工作以及 map 如何将其转移到后续字符串算法感到困惑。

最佳答案

map将输入值映射到传入类型中的相应值。

通常传入的类型是系列、字典或函数，在您的情况下，它将 str 构造函数作为函数调用，并将其与 '_x' 连接起来。

但是，正如您发现的 df['a'] + "_z" 和 pd.Series(df['a']) + "_z" 将无法工作，因为没有为这些类型定义操作数(ndarray 和 str)。

您可以使用:

In [8]:    
df['a'].astype(str) + '_z'

Out[8]:
m    1_z
n    2_z
o    3_z
Name: a, dtype: object

需要考虑的是，当您调用 df['a'].map(str) 时，dtype 实际上更改为 str:

In [13]:    
df['a'].map(str).dtype

Out[13]:
dtype('O')

所以您可以看到为什么您的第一个版本有效，因为您基本上更改了 dtype 或系列，所以上面与 df['a'].astype(str)

关于python - 为什么 pandas series.map 方法适用于列连接？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/31302007/

python - 为什么 pandas series.map 方法适用于列连接？

上一篇：python - 从 Google 电子表格中获取所有值

下一篇：python - 使用 REGEX 解析带参数的线性方程