我的数据集包含两列:名称和日期。这些名字在不同的日期重复。我正在尝试获取唯一名称的列表以及该名称第一次出现的日期。
这是数据示例:
Name Date
0 Smith 2021-01-02
1 Johnson 2021-01-03
2 Williams 2021-01-04
3 Brown 2021-01-05
4 Jones 2021-01-06
5 Garcia 2021-01-07
6 Miller 2021-01-08
7 Davis 2021-01-09
8 Rodriguez 2020-01-10
9 Martinez 2018-01-11
10 Smith 2021-01-12
11 Johnson 2017-01-13
12 Williams 2016-01-24
13 Brown 2011-12-15
14 Jones 2009-01-16
15 Garcia 2008-11-17
16 Miller 2021-02-18
17 Davis 2004-03-09
18 Rodriguez 2001-01-20
19 Martinez 2002-01-21
20 Smith 2010-05-22
21 Johnson 2011-12-23
22 Williams 2006-01-04
23 Brown 2006-01-25
24 Jones 2014-01-04
25 Garcia 2013-01-13
26 Miller 2021-01-28
27 Davis 2021-01-29
28 Rodriguez 2021-01-30
29 Martinez 2021-01-31
我尝试使用 pandas groupby 函数,但没有得到正确的结果:
#Sample Data
import pandas as pd
df = pd.read_excel("SampleData2.xlsx")
df["first_test"] = df.groupby("Name")["Date"].transform("first")
df
我得到这样的导出:
Name Date first_test
0 Smith 2021-01-02 2021-01-02
1 Johnson 2021-01-03 2021-01-03
2 Williams 2021-01-04 2021-01-04
3 Brown 2021-01-05 2021-01-05
4 Jones 2021-01-06 2021-01-06
5 Garcia 2021-01-07 2021-01-07
6 Miller 2021-01-08 2021-01-08
7 Davis 2021-01-09 2021-01-09
8 Rodriguez 2020-01-10 2020-01-10
9 Martinez 2018-01-11 2018-01-11
10 Smith 2021-01-12 2021-01-02
11 Johnson 2017-01-13 2021-01-03
12 Williams 2016-01-24 2021-01-04
13 Brown 2011-12-15 2021-01-05
14 Jones 2009-01-16 2021-01-06
15 Garcia 2008-11-17 2021-01-07
16 Miller 2021-02-18 2021-01-08
17 Davis 2004-03-09 2021-01-09
18 Rodriguez 2001-01-20 2020-01-10
19 Martinez 2002-01-21 2018-01-11
20 Smith 2010-05-22 2021-01-02
21 Johnson 2011-12-23 2021-01-03
22 Williams 2006-01-04 2021-01-04
23 Brown 2006-01-25 2021-01-05
24 Jones 2014-01-04 2021-01-06
25 Garcia 2013-01-13 2021-01-07
26 Miller 2021-01-28 2021-01-08
27 Davis 2021-01-29 2021-01-09
28 Rodriguez 2021-01-30 2020-01-10
29 Martinez 2021-01-31 2018-01-11
但是,我想要得到的是:
Name FirstDate
Smith 5/22/2010
Johnson 12/23/2011
Williams 1/4/2006
Brown 1/25/2006
Jones 1/16/2009
Garcia 11/17/2008
Miller 1/8/2021
Davis 3/9/2004
Rodriguez 1/20/2001
Martinez 1/21/2002
知道如何做到这一点吗?
最佳答案
按名称分组,然后取最小日期。像这样的事情:
df.groupby('Name').agg({'Date': 'min'})
根据您提供的df
,我发现:
Date
Name
Brown 2006-01-25
Davis 2004-03-09
Garcia 2008-11-17
Johnson 2011-12-23
Jones 2009-01-16
Martinez 2002-01-21
Miller 2021-01-08
Rodriguez 2001-01-20
Smith 2010-05-22
Williams 2006-01-04
关于python - 按日期查找第一次出现的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68534586/