python - 为什么不将工作替换为以元组为键的字典？

我一直认为 .map 和 .replace 本质上是相同的，只不过当你想传递时你会使用 .replace不在提供的字典中的键的值。但是，我很困惑为什么 .replace 在传递以元组为键的字典时会抛出 TypeError ，而 .map 的作用是预期使用相同的字典。

例如:

import pandas as pd
df = pd.DataFrame({'ID1': [1, 2, 3, 4, 5], 
                   'ID2': ['A', 'B', 'C', 'D', 'E']})
df['tup_col'] = pd.Series(list(zip(df.ID1, df.ID2)))

dct = {(1, 'A'): 'apple', (3, 'C'): 'banana', (5, 'X'): 'orange'}

df.tup_col.map(dct)
#0     apple
#1       NaN
#2    banana
#3       NaN
#4       NaN
#Name: tup_col, dtype: object

df.tup_col.replace(dct)

TypeError: Cannot compare types 'ndarray(dtype=object)' and 'tuple'

那么，对于以元组为键的字典，我可以不使用 replace 吗？

最佳答案

不，这行不通

首先 Pandas 从字典中获取键和值，然后使用这些可迭代对象调用 replace:

keys, values = zip(*items)
to_replace, value = keys, values

return self.replace(to_replace, value, inplace=inplace,
                    limit=limit, regex=regex)

接下来，由于您现在有了类似于 list_like 的键和值，因此它会输入到 replace_list 中:

elif is_list_like(to_replace):  # [NA, ''] -> [0, 'missing']
    if is_list_like(value):
        new_data = self._data.replace_list(src_list=to_replace, dest_list=value,
                                           inplace=inplace, regex=regex)

接下来，replace_list 尝试在元组数组和值数组之间进行比较:

def comp(s):
    if isnull(s):
        return isnull(values)
    return _possibly_compare(values, getattr(s, 'asm8', s),
                             operator.eq)

masks = [comp(s) for i, s in enumerate(src_list)]

最后，_possible_compare 检查值是否由标量组成，而键是类似数组的，从而导致错误:

if is_scalar(result) and (is_a_array or is_b_array):
    raise TypeError("Cannot compare types %r and %r" % tuple(type_names))

有些位，可能是重要的位，我在这里排除了。但希望您能明白要点。

结论

在我看来，pd.Series.replace有严重的问题。与大多数 Pandas API 不同，它通常是不可预测的，无论是它实现的功能还是性能。很明显它的 block 是用纯 Python 编写的并且性能不佳。

documentation很好地总结了歧义:

This method has a lot of options. You are encouraged to experiment and play with this method to gain intuition about how it works.

pd.Series.map效率很高，并且不受 replace 中实现的纯 Python 逻辑的影响。

参见Replace values in a pandas series via dictionary efficiently再举一个例子。

坚持使用map，不要回头去replace。

关于python - 为什么不将工作替换为以元组为键的字典？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51309108/

python - 为什么不将工作替换为以元组为键的字典？

不，这行不通

结论

上一篇：python - PyQt5 QtSql 在 QThread 中访问数据库

下一篇：python - 属性错误: type object 'MyUser' has no attribute 'USERNAME_FIELD'