我有一个 pandas 数据框,我想在其中根据数据框中的另一列更新列的值。我之前使用以下代码对其进行更新:
for i1, col1 in dfMod.iterrows():
if col1['day'] == "MONDAY":
dfMod.ix[i1,'weekIndex'] = 1
elif col1['day'] == "TUESDAY":
dfMod.ix[i1,'weekIndex'] = 2
elif col1['day'] == "WEDNESDAY":
dfMod.ix[i1,'weekIndex'] = 3
elif col1['day'] == "THURSDAY":
dfMod.ix[i1,'weekIndex'] = 4
elif col1['day'] == "FRIDAY":
dfMod.ix[i1,'weekIndex'] = 5
elif col1['day'] == "SATURDAY":
dfMod.ix[i1,'weekIndex'] = 6
else:
dfMod.ix[i1,'weekIndex'] = 7
但是,数据框有 300,000 行并且需要很长时间才能编译。有没有更好的更新列的方法?
最佳答案
你需要map
通过 dict
:
d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4,
"FRIDAY":5, "SATURDAY":6, "SUNDAY":7}
dfMod["weekIndex"] = dfMod["day"].map(d)
示例:
dfMod = pd.DataFrame({'day':['TUESDAY','THURSDAY','FRIDAY','SATURDAY','MONDAY','SUNDAY']})
d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3,
"THURSDAY":4, "FRIDAY":5, "SATURDAY":6, "SUNDAY":7}
dfMod["weekIndex"] = dfMod["day"].map(d)
print (dfMod)
day weekIndex
0 TUESDAY 2
1 THURSDAY 4
2 FRIDAY 5
3 SATURDAY 6
4 MONDAY 1
5 SUNDAY 7
300k
中的 Timings - map
比 apply
解决方案快 6 倍
:
dfMod = pd.DataFrame({'day':['TUESDAY','THURSDAY','FRIDAY','SATURDAY','MONDAY','SUNDAY']})
#300k rows
dfMod = pd.concat([dfMod]*50000).reset_index(drop=True)
d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4,
"FRIDAY":5, "SATURDAY":6, "SUNDAY":7}
In [92]: %timeit dfMod["weekIndex"] = dfMod["day"].map(d)
10 loops, best of 3: 22.7 ms per loop
In [93]: %timeit dfMod["weekIndex1"] = dfMod["day"].apply(lambda x: d[x])
10 loops, best of 3: 141 ms per loop
关于python - Pandas:在没有 for 循环的情况下更新 pandas 数据框中的列的更有效方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43005012/