我试图通过循环超过 100,000 行并在现有行上设置行条件值来在 pandas 数据框中生成新列。
当前数据帧是一个虚拟数据帧,但可以作为示例。我当前的代码是:
df=pd.DataFrame({'IT100':[5,5,-0.001371,0.0002095,-5,0,-5,5,5],
'ET110':[0.008187884,0.008285232,0.00838258,0.008479928,1,1,1,1,1]})
# if charging set to 1, if discharging set to -1.
# if -1 < IT100 < 1 then set CD to previous cells value
# Charging is defined as IT100 > 1 and Discharge is defined as IT100 < -1
def CD(dataFrame):
for x in range(0,len(dataFrame.index)):
current = dataFrame.loc[x,"IT100"]
if x == 0:
if dataFrame.loc[x+5,"IT100"] > -1:
dataFrame.loc[x,"CD"] = 1
else:
dataFrame.loc[x,"CD"] = -1
else:
if current > 1:
dataFrame.loc[x,"CD"] = 1
elif current < -1:
dataFrame.loc[x,"CD"] = -1
else:
dataFrame.loc[x,"CD"] = dataFrame.loc[x-1,"CD"]
使用 if/Else 循环非常慢。我看到人们建议使用 np.select() 或 pd.apply(),但我不知道这是否适用于我的示例。我需要能够对该列建立索引,因为我的条件之一是将新列的值设置为感兴趣列中前一个单元格的值。
感谢您的帮助!
最佳答案
@Grajdeanu Alex 是对的,循环比你在其中所做的任何事情都更减慢你的速度。对于 pandas,循环通常是最慢的选择。试试这个:
import pandas as pd
import numpy as np
df = pd.DataFrame({'IT100':[0,-50,-20,-0.5,-0.25,-0.5,-10,5,0.5]})
df['CD'] = np.nan
#lower saturation
df.loc[df['IT100'] < -1,['CD']] = -1
#upper saturation
df.loc[df['IT100'] > 1,['CD']] = 1
#fill forward
df['CD'] = df['CD'].ffill()
# setting the first row equal to the fifth
df.loc[0,['CD']] = df.loc[5,['CD']]
using ffill
将使用最后一个有效值填充后续的 nan 值 (-1 < x < 1)
关于python - 如何加速Python中的条件语句,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65485321/