我有两个数据框,它们都有日期。数据框为每个 Type 和每个 State 重复日期,因为它是一个累积求和帧,如下所示:
Date State Type Value
2010-01-01 AK NUC 10
2010-02-01 AK NUC 10
2010-03-01 AK NUC 10
.
.
2010-01-01 CO NUC 2
2010-02-01 CO NUC 2
.
.
2010-01-01 AK WND 20
2010-02-01 AK WND 21
.
.
2018-08-01 .......
我需要做的是获取第二个数据框并根据'Operating Date'添加到每个'Type'和'State' 然后根据 'Retirement' 'State' 全部相对于原始 'Date' .第二个数据框看起来像:
Operating Date Retirement Date Type State Value
2010-02-01 2010-04-01 NUC AK 1
2011-02-01 2014-02-01 NUC AK 2
2011-03-01 2016-03-01 NUC AK 10
.
.
.
2018-08-01 .......
例如,在 AK 上,输出将像这样加减:
if AK(Date) == AK(Operating Date):
AK(Value, Date) = AK(Value, Date) + AK(Value, Operating Date)
elif AK(Date) == AK(Retirement Date):
AK(Value, Date) = AK(Value, Date) - AK(Value, Retirement Date)
else:
continue
实际输出数据帧(仅适用于 AK 'NUC')将是:
Date State Type Value
2010-01-01 AK NUC 10
2010-02-01 AK NUC 11
2010-03-01 AK NUC 11
2010-04-01 AK NUC 10
.
.
2011-01-01 AK NUC 10
2011-02-01 AK NUC 12
2011-03-01 AK NUC 22
2011-04-01 AK NUC 22
.
.
2016-01-01 AK NUC 22
2010-02-01 AK NUC 22
2010-03-01 AK NUC 12
2010-04-01 AK NUC 12
.
.
我该如何着手进行此类操作?
最佳答案
下面代码中使用的主要DataFrame
df
Date State Type Value
2010-01-01 AK NUC 10
2010-02-01 AK NUC 10
2010-03-01 AK NUC 10
2010-01-01 CO NUC 2
2010-02-01 CO NUC 2
2010-01-01 AK WND 20
2010-02-01 AK WND 21
你要添加到main的改动,注意我把空格换成了_
delta
Operating_Date Retirement_Date Type State Value
2010-02-01 2010-04-01 NUC AK 1
2011-02-01 2014-02-01 NUC AK 2
2011-03-01 2016-03-01 NUC AK 10
攻击计划是使用一个日期列,为了做到这一点,我们需要将退休日期和工作日期合并到一个列中,当我们使用退休日期时,我们给值一个负数,并保留正值营业日期
#We first make a copy of the delta, we will call these cancellations and use the
#Retirement_Date and the value in negative
cx = delta.copy()
cx['Date']=cx['Retirement_Date']
cx.drop(['Operating_Date','Retirement_Date'],axis=1,inplace=True)
cx['Value'] *=-1
#In the original delta we assign operating date as the date value
delta['Date'] = delta['Operating_Date']
delta.drop(['Operating_Date','Retirement_Date'],axis=1,inplace=True)
#We then append the cancellations to the main delta frame and rename the values
#column to delta
delta = delta.append(cx)
delta.rename(columns={'Value':'Delta'},inplace=True)
我们现在有一个包含一个日期列的数据框,其中包含我们希望每个日期跟踪的所有积极和消极变化
delta
Type State Delta Date
NUC AK 1 2010-02-01
NUC AK 2 2011-02-01
NUC AK 10 2011-03-01
NUC AK -1 2010-04-01
NUC AK -2 2014-02-01
NUC AK -10 2016-03-01
现在我们需要做的就是将变化的累积值添加到主数据框
#we start by merging the data frames, as the column names are the same and we want to merge on all of them we just specify that it's an outer join
df = df.merge(delta,how='outer')
#if there are any new dates in the delta that aren't in the main dataframe we want to bring forth our cumulative sum
#but first we need to make sure we sort by date so the cumulative sum works
df.sort_values(['Type','State','Date'],inplace=True)
df['Value'] = df.groupby(['State','Type'])['Value'].ffill()
#for the dates where we have no changes we fill with zeros
df['Delta'].fillna(0,inplace=True)
#we can now add the cumilative sum of the delta to the values column
df['Value'] +=df.groupby(['State','Type'])['Delta'].cumsum().astype(int)
#and lastly we can remove the delta column again and we're done
del df['Delta']
最终的数据框,希望这就是你想要的
df
Date State Type Value
2010-01-01 AK NUC 10
2010-02-01 AK NUC 11
2010-03-01 AK NUC 11
2010-04-01 AK NUC 10
2011-02-01 AK NUC 12
2011-03-01 AK NUC 22
2014-02-01 AK NUC 20
2016-03-01 AK NUC 10
2010-01-01 CO NUC 2
2010-02-01 CO NUC 2
2010-01-01 AK WND 20
2010-02-01 AK WND 21
关于python - 基于关闭年份条件 Python 添加和减去值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53548360/