python - Pandas 比较两个字符串列以创建第三列

标签 python pandas user-defined-functions

我的数据框包含两列不同的类钻石、黄金和白银

class_pd = pd.DataFrame({'old_class':['gold', 'gold' , 'silver'],
    'new_class':['diamond', 'silver', 'silver']})

我想创建一个新列,显示类已升级降级

我尝试过的

我编写了以下函数来设置规则

def status_desc(class_pd, old_class, new_class):
    if ((class_pd['old_class'] == 'gold') & (class_pd['new_class'] == 'diamond') or \
       (class_pd['old_class'] == 'silver') & (class_pd['new_class'] == 'diamond') or \
       (class_pd['old_class'] == 'silver') & (class_pd['new_class'] == 'gold')):
        val = 'Upgrade'
    elif ((class_pd['old_class'] == 'diamond') & (class_pd['new_class'] == 'gold') or \
       (class_pd['old_class'] == 'diamond') & (class_pd['new_class'] == 'silver') or \
       (class_pd['old_class'] == 'gold') & (class_pd['new_class'] == 'silver')):
        val = 'Downgrade'
    else:
         val = 'NA'

然后我尝试使用以下方法将该函数应用于我的数据框

class_pd['class_desc'] = class_pd.apply(lambda x: status_desc(class_pd['old_class'], class_pd['new_class']), axis=1)

错误

我收到此错误

TypeError: status_desc() missing 1 required positional argument: new_class

所需输出

class_pd = pd.DataFrame({'old_class':['gold', 'gold' , 'silver'],
    'new_class':['diamond', 'silver', 'silver'],
                        'class_desc':['Upgrade','Downgrade', 'NA']})

最佳答案

另一个解决方案 pd.Categorical ,对我来说似乎更优雅并且更具可扩展性:

categories = ['silver', 'gold', 'diamond']
class_pd = class_pd.apply(pd.Categorical, categories=categories, ordered=True)

class_pd['class_desc'] = 'NA'

class_pd.loc[class_pd.old_class > class_pd.new_class, 'class_desc'] = 'Downgrade'
class_pd.loc[class_pd.old_class < class_pd.new_class, 'class_desc'] = 'Upgrade'

我们告诉 Pandas 固有的顺序,然后可以使用比较运算符。

@jezrael 建议使用 numpy.select 完成最后一点(添加类别后)的另一种方法:

import numpy as np

conditions = [
    class_pd.old_class < class_pd.new_class,
    class_pd.old_class > class_pd.new_class,
    class_pd.old_class == class_pd.new_class,
]
labels = ["Upgrade", "Downgrade", "NA"]
class_pd["class_desc"] = np.select(conditions, labels)

关于python - Pandas 比较两个字符串列以创建第三列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74054640/

相关文章:

python - append 列表错误类型错误序列项 o : expecting string, 列表已找到

python - 矩阵的xy坐标列表

python - 如何将行转换为数据框 pandas 内的列表

sql - udf 与直接 sql 性能

user-defined-functions - 在 Pig 中按袋子值(value)分组

python - 如何使正则表达式不接受某些值?

python - 如何在 Python 3 urllib 中设置任意主机 header ?

python - Pandas:添加新的计算(分数)行

python - pandas 从日期范围列中提取开始和结束日期

从 Shiny 的用户定义函数返回无功输出