假设我有一个如下所示的 Pandas 数据框:
df = pd.DataFrame({
"YEAR":[2000,2000,2001,2001,2002],
"VISITORS":[100,2000,200,300,250],
"SALES":[5000,2500,23500,1512,3510],
"MONTH":[1,2,1,2,1],
"LOCATION":["Loc1", "Loc2", "Loc1" , "Loc2" , "Loc1"]})
我想将此数据框的 MONTH
、LOCATION
列与同一 Pandas 数据框的上一年数据连接起来。
我尝试过这个:
def calculate(df):
result_all_years = []
for current_year in df["YEAR"].unique():
df_previous = df.copy()
df_previous = df_previous[df_previous["YEAR"] == current_year - 1]
df_previous.rename(
columns={
"VISITORS": "VISITORS_LAST_YEAR",
"SALES": "SALES_LAST_YEAR",
"YEAR": "PREVIOUS_YEAR",
},
inplace=True,
)
df_current = df[df["YEAR"] == current_year]
df_current = df_current.merge(
df_previous,
how="left",
on=["MONTH", "LOCATION" ]
)
# There are many simular calculations and additional columns to be added like the following:
df_current["SALES_DIFF"] = df_current["SALES"] - df_current["SALES_LAST_YEAR"]
result_all_years.append(df_current)
return pd.concat(result_all_years, ignore_index=True).round(3)
calculate
函数中的代码工作正常。但有没有更快的方法呢?可能更快?
最佳答案
尝试与相同的数据帧合并
并相应地操作它
diff_df = pd.merge(df, df, left_on = [df['YEAR'], df['MONTH'], df['LOCATION']], suffixes=('', '_PREV'),
right_on = [df['YEAR']+1, df['MONTH'], df['LOCATION']])
diff_df = diff_df[['YEAR', 'YEAR_PREV', 'MONTH', 'LOCATION','VISITORS','VISITORS_PREV','SALES','SALES_PREV']]
diff_df = diff_df.assign(VISITORS_DIFF = (diff_df['VISITORS_PREV'] - diff_df['VISITORS']),
SALES_DIFF = (diff_df['SALES_PREV'] - diff_df['SALES']))
输出
YEAR YEAR_PREV MONTH LOCATION VISITORS VISITORS_PREV SALES SALES_PREV VISITORS_DIFF SALES_DIFF
2001 2000 1 Loc1 200 100 23500 5000 -100 -18500
2001 2000 2 Loc2 300 2000 1512 2500 1700 988
2002 2001 1 Loc1 250 200 3510 23500 -50 19990
关于python - 加入上一年的附加计算,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71496351/