我有一个非常大的 DataFrame,如下所示:
id amt date 1 0 2010-02-01 1 0 2012-05-12 1 0 2016-08-09 1 20 1970-01-01 2 0 2016-03-21 2 0 2017-11-10 2 0 2012-09-01 2 0 2016-04-15
What I want is to reduce it to one row per id according to following logic:
- For a given ID-group: if amt > 0 and date == 1970-01-01 then output row.
- For a given ID-group: if amt == 0 for all id rows, output max date for id
I want appearance according to below.
id amt date 1 20 1970-01-01 2 0 2017-11-10
I have actually solved it through sort and grouping by ID and then taking last(). However, my issue came when I tried to write a function which operates on each separate groupby object and applies the logic i have in point 1 and point 2 above (if/else-style). Can someone help me with this?
Code for DataFrame is below - and please note, the data is large so quick execution is helpful.
Many thanks,
/Swepab
df = pd.DataFrame({'id' : [1, 1, 1, 1, 2, 2, 2, 2]
,'amt' : [0, 0, 0, 20, 0 ,0, 0, 0]
,'date' : ['2010-02-01', '2012-05-12','2016-08-09'
,'1970-01-01','2016-03-21','2017-11-10'
,'2012-09-01','2016-04-15']})
df['date'] = pd.to_datetime(df.date,format = "%Y-%m-%d")
df = df[['id', 'amt', 'date']]
最佳答案
我编写了一个自定义函数,您可以将其应用于各个组
def custom_fx(df):
if df.amt.sum() == 0:
max_date = df.date.max()
return df.loc[df.date==max_date,:]
elif df.amt.sum() != 0 :
return df[df.date.isin(["1970-01-01"])]
for groups,data in df.groupby("id"):
print(custom_fx(data))
输出:
amt date id
3 20 1970-01-01 1
amt date id
5 0 2017-11-10 2
关于python - 将多个 if/else 语句应用于 pandas 中的 groupby 对象,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47945097/