我有以下代码
import pandas as pd
d = [{'points': 50, 'time': '5:00', 'year': 2010},
{'points': 25, 'time': '6:00', 'month': "february"},
{'points': 90, 'time': '9:00', 'month': 'january'},
{'points_h1': 20, 'month': 'june'}]
df = pd.DataFrame(d)
df['auditor'] = None
df.loc[df['points'] == 50, 'auditor'] = (1, 2)
print(df)
print(df.loc[df['points'] == 50, 'auditor'])
我想启动一个新列,并使用 None
并有条件地使用 tuple
更新其值,但出现以下错误。
ValueError: cannot set using a multi-index selection indexer with a different length than the value
我想要的结果是
month points points_h1 time year auditor
0 NaN 50 NaN 5:00 2010 (1,2)
1 february 25 NaN 6:00 NaN None
2 january 90 NaN 9:00 NaN None
3 june NaN 20 NaN NaN None
我该怎么做?
最佳答案
您还可以使用np.where()
,这是一个很好的条件函数:
df['auditor'] = np.where((df['points'] == 50), pd.Series([(1, 2)]), None)
使用 .assign() 创建数据帧时在一行中或
:
df = pd.DataFrame(d).assign(auditor=np.where((df['points'] == 50), pd.Series([(1, 2)]), None))
import pandas as pd, numpy as np
d = [{'points': 50, 'time': '5:00', 'year': 2010},
{'points': 25, 'time': '6:00', 'month': "february"},
{'points': 90, 'time': '9:00', 'month': 'january'},
{'points_h1': 20, 'month': 'june'}]
df = pd.DataFrame(d).assign(auditor=np.where((df['points'] == 50), pd.Series([(1, 2)]), None))
df
Out[34]:
points time year month points_h1 auditor
0 50.0 5:00 2010.0 NaN NaN (1, 2)
1 25.0 6:00 NaN february NaN None
2 90.0 9:00 NaN january NaN None
3 NaN NaN NaN june 20.0 None
根据您的评论,如果您想手动创建条件和结果,然后循环遍历 np.where()
,那么您将这样做:
import pandas as pd, numpy as np
d = [{'points': 50, 'time': '5:00', 'year': 2010},
{'points': 25, 'time': '6:00', 'month': "february"},
{'points': 90, 'time': '9:00', 'month': 'january'},
{'points_h1': 20, 'month': 'june'}]
df = pd.DataFrame(d)
#Manually Set Conditions and Rsults
c1 = (df['points'] == 50)
r1 = pd.Series([(1, 2)])
c2 = (df['points'] == 25)
r2 = pd.Series([(1, 3)])
conditions = [c1,c2]
results = [r1,r2]
df['auditor'] = None
for c, r in zip(conditions, results):
df['auditor'] = np.where(c, r, df['auditor'])
df
Out[39]:
points time year month points_h1 auditor
0 50.0 5:00 2010.0 NaN NaN (1, 2)
1 25.0 6:00 NaN february NaN (1, 3)
2 90.0 9:00 NaN january NaN None
查看 Anky 的评论。而不是:
df['auditor'] = None
for c, r in zip(conditions, results):
df['auditor'] = np.where(c, r, df['auditor'])
您可以使用np.select
来避免循环。这是一个更Pythonic的。有效的方法来做到这一点:
df['auditor'] = np.select(conditions,results,None)
关于python - 如何使用 None 启动新列并有条件地使用元组更新其值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63533446/