python - 如何使用 None 启动新列并有条件地使用元组更新其值?

标签 python pandas dataframe

我有以下代码

import pandas as pd
d = [{'points': 50, 'time': '5:00', 'year': 2010},
     {'points': 25, 'time': '6:00', 'month': "february"},
     {'points': 90, 'time': '9:00', 'month': 'january'},
     {'points_h1': 20, 'month': 'june'}]
df = pd.DataFrame(d)
df['auditor'] = None
df.loc[df['points'] == 50, 'auditor'] = (1, 2)
print(df)
print(df.loc[df['points'] == 50, 'auditor'])

我想启动一个新列,并使用 None 并有条件地使用 tuple 更新其值,但出现以下错误。

ValueError: cannot set using a multi-index selection indexer with a different length than the value

我想要的结果是

      month  points  points_h1  time  year  auditor
0       NaN      50        NaN  5:00  2010  (1,2)
1  february      25        NaN  6:00   NaN  None
2   january      90        NaN  9:00   NaN  None
3      june     NaN         20   NaN   NaN  None

我该怎么做?

最佳答案

您还可以使用np.where(),这是一个很好的条件函数:

df['auditor'] = np.where((df['points'] == 50), pd.Series([(1, 2)]), None)

使用 .assign() 创建数据帧时在一行中或:

df = pd.DataFrame(d).assign(auditor=np.where((df['points'] == 50), pd.Series([(1, 2)]), None))

import pandas as pd, numpy as np
d = [{'points': 50, 'time': '5:00', 'year': 2010},
     {'points': 25, 'time': '6:00', 'month': "february"},
     {'points': 90, 'time': '9:00', 'month': 'january'},
     {'points_h1': 20, 'month': 'june'}]
df = pd.DataFrame(d).assign(auditor=np.where((df['points'] == 50), pd.Series([(1, 2)]), None))
df

Out[34]: 
   points  time    year     month  points_h1 auditor
0    50.0  5:00  2010.0       NaN        NaN  (1, 2)
1    25.0  6:00     NaN  february        NaN    None
2    90.0  9:00     NaN   january        NaN    None
3     NaN   NaN     NaN      june       20.0    None

根据您的评论,如果您想手动创建条件和结果,然后循环遍历 np.where(),那么您将这样做:

import pandas as pd, numpy as np
d = [{'points': 50, 'time': '5:00', 'year': 2010},
     {'points': 25, 'time': '6:00', 'month': "february"},
     {'points': 90, 'time': '9:00', 'month': 'january'},
     {'points_h1': 20, 'month': 'june'}]
df = pd.DataFrame(d)

#Manually Set Conditions and Rsults
c1 = (df['points'] == 50)
r1 =  pd.Series([(1, 2)])
c2 = (df['points'] == 25)
r2 = pd.Series([(1, 3)])
conditions = [c1,c2]
results = [r1,r2]

df['auditor'] = None
for c, r in zip(conditions, results):
    df['auditor'] = np.where(c, r, df['auditor'])
df

Out[39]: 
   points  time    year     month  points_h1 auditor
0    50.0  5:00  2010.0       NaN        NaN  (1, 2)
1    25.0  6:00     NaN  february        NaN  (1, 3)
2    90.0  9:00     NaN   january        NaN    None

查看 Anky 的评论。而不是:

df['auditor'] = None
    for c, r in zip(conditions, results):
        df['auditor'] = np.where(c, r, df['auditor'])

您可以使用np.select来避免循环。这是一个更Pythonic的。有效的方法来做到这一点:

df['auditor'] = np.select(conditions,results,None)

关于python - 如何使用 None 启动新列并有条件地使用元组更新其值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63533446/

相关文章:

python - Pyparsing - 从数字到列表的简单解析

python - 值错误 : Must specify a fill 'value' or 'method'

python - Pandas groupby() 比较并计算两列

r - 在数据框中选择观察值并反转它们的顺序

python - 为什么我的 pandas df 是所有对象数据类型而不是例如整数、字符串等?

数据帧中按组键的 Python 值差异

python - 尝试通过 SQLAlchemy 重新使用主键 ID 时出现问题

python - 如何为 Beta 测试人员打包 SC 仪器?

Python 线程和 PySimpleGUI

python - 在单元格中的第一个字母之后拆分 Pandas 数据框列(一分为二)