python - 如何使用 None 启动新列并有条件地使用元组更新其值？

我有以下代码

import pandas as pd
d = [{'points': 50, 'time': '5:00', 'year': 2010},
     {'points': 25, 'time': '6:00', 'month': "february"},
     {'points': 90, 'time': '9:00', 'month': 'january'},
     {'points_h1': 20, 'month': 'june'}]
df = pd.DataFrame(d)
df['auditor'] = None
df.loc[df['points'] == 50, 'auditor'] = (1, 2)
print(df)
print(df.loc[df['points'] == 50, 'auditor'])

我想启动一个新列，并使用 None 并有条件地使用 tuple 更新其值，但出现以下错误。

ValueError: cannot set using a multi-index selection indexer with a different length than the value

我想要的结果是

      month  points  points_h1  time  year  auditor
0       NaN      50        NaN  5:00  2010  (1,2)
1  february      25        NaN  6:00   NaN  None
2   january      90        NaN  9:00   NaN  None
3      june     NaN         20   NaN   NaN  None

我该怎么做？

最佳答案

您还可以使用np.where()，这是一个很好的条件函数:

df['auditor'] = np.where((df['points'] == 50), pd.Series([(1, 2)]), None)

使用 .assign() 创建数据帧时在一行中或:

df = pd.DataFrame(d).assign(auditor=np.where((df['points'] == 50), pd.Series([(1, 2)]), None))

import pandas as pd, numpy as np
d = [{'points': 50, 'time': '5:00', 'year': 2010},
     {'points': 25, 'time': '6:00', 'month': "february"},
     {'points': 90, 'time': '9:00', 'month': 'january'},
     {'points_h1': 20, 'month': 'june'}]
df = pd.DataFrame(d).assign(auditor=np.where((df['points'] == 50), pd.Series([(1, 2)]), None))
df

Out[34]: 
   points  time    year     month  points_h1 auditor
0    50.0  5:00  2010.0       NaN        NaN  (1, 2)
1    25.0  6:00     NaN  february        NaN    None
2    90.0  9:00     NaN   january        NaN    None
3     NaN   NaN     NaN      june       20.0    None

根据您的评论，如果您想手动创建条件和结果，然后循环遍历 np.where()，那么您将这样做:

import pandas as pd, numpy as np
d = [{'points': 50, 'time': '5:00', 'year': 2010},
     {'points': 25, 'time': '6:00', 'month': "february"},
     {'points': 90, 'time': '9:00', 'month': 'january'},
     {'points_h1': 20, 'month': 'june'}]
df = pd.DataFrame(d)

#Manually Set Conditions and Rsults
c1 = (df['points'] == 50)
r1 =  pd.Series([(1, 2)])
c2 = (df['points'] == 25)
r2 = pd.Series([(1, 3)])
conditions = [c1,c2]
results = [r1,r2]

df['auditor'] = None
for c, r in zip(conditions, results):
    df['auditor'] = np.where(c, r, df['auditor'])
df

Out[39]: 
   points  time    year     month  points_h1 auditor
0    50.0  5:00  2010.0       NaN        NaN  (1, 2)
1    25.0  6:00     NaN  february        NaN  (1, 3)
2    90.0  9:00     NaN   january        NaN    None

查看 Anky 的评论。而不是:

df['auditor'] = None
    for c, r in zip(conditions, results):
        df['auditor'] = np.where(c, r, df['auditor'])

您可以使用np.select来避免循环。这是一个更Pythonic的。有效的方法来做到这一点:

df['auditor'] = np.select(conditions,results,None)

关于python - 如何使用 None 启动新列并有条件地使用元组更新其值？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/63533446/

python - 如何使用 None 启动新列并有条件地使用元组更新其值？

上一篇：sass - 啤酒名称中的/(斜杠)有特殊含义吗？或者它只是一个斜线作为字符串？

下一篇：r - 不同概率的伯努利试验