python - Pandas:在没有 for 循环的情况下更新 pandas 数据框中的列的更有效方法

标签 python pandas

我有一个 pandas 数据框,我想在其中根据数据框中的另一列更新列的值。我之前使用以下代码对其进行更新:

for i1, col1 in dfMod.iterrows():
if col1['day'] == "MONDAY":
    dfMod.ix[i1,'weekIndex'] = 1
elif col1['day'] == "TUESDAY":
    dfMod.ix[i1,'weekIndex'] = 2
elif col1['day'] == "WEDNESDAY":
    dfMod.ix[i1,'weekIndex'] = 3
elif col1['day'] == "THURSDAY":
    dfMod.ix[i1,'weekIndex'] = 4
elif col1['day'] == "FRIDAY":
    dfMod.ix[i1,'weekIndex'] = 5
elif col1['day'] == "SATURDAY":
    dfMod.ix[i1,'weekIndex'] = 6
else:
    dfMod.ix[i1,'weekIndex'] = 7

但是,数据框有 300,000 行并且需要很长时间才能编译。有没有更好的更新列的方法?

最佳答案

你需要map通过 dict:

d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4, 
     "FRIDAY":5, "SATURDAY":6, "SUNDAY":7}

dfMod["weekIndex"] = dfMod["day"].map(d)

示例:

dfMod = pd.DataFrame({'day':['TUESDAY','THURSDAY','FRIDAY','SATURDAY','MONDAY','SUNDAY']})

d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, 
     "THURSDAY":4, "FRIDAY":5, "SATURDAY":6, "SUNDAY":7}

dfMod["weekIndex"] = dfMod["day"].map(d)
print (dfMod)
        day  weekIndex
0   TUESDAY          2
1  THURSDAY          4
2    FRIDAY          5
3  SATURDAY          6
4    MONDAY          1
5    SUNDAY          7
300k 中的

Timings - mapapply 解决方案快 6 倍 :

dfMod = pd.DataFrame({'day':['TUESDAY','THURSDAY','FRIDAY','SATURDAY','MONDAY','SUNDAY']})
#300k rows
dfMod = pd.concat([dfMod]*50000).reset_index(drop=True)

d = {"MONDAY": 1, "TUESDAY":2, "WEDNESDAY":3, "THURSDAY":4, 
     "FRIDAY":5, "SATURDAY":6, "SUNDAY":7}

In [92]: %timeit dfMod["weekIndex"] = dfMod["day"].map(d)
10 loops, best of 3: 22.7 ms per loop

In [93]: %timeit dfMod["weekIndex1"] = dfMod["day"].apply(lambda x: d[x])
10 loops, best of 3: 141 ms per loop

关于python - Pandas:在没有 for 循环的情况下更新 pandas 数据框中的列的更有效方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43005012/

相关文章:

Python pysft/paramiko 'EOF during negotiation' 错误

Python collections.Counter() 运行时

python - 如何解决线性回归中的 "Exception: Data must be 1-dimensional"?

python - 如何使用 np.average 包含 np.nan 来计算平均值? (使用groupby时)

python - 删除python中的大量空格

python - 将图像存储在 MongoDB 中

python - 在 Pandas 中如何从日期时间获取 UTC 时间戳(在给定时间)?

python - 将字典中的值从字符串转换为浮点型

python - Pandas :按行计算值

python - 脱脂图像 : how to combine RGB channels?