python - Groupby、移位和前向填充

标签 python pandas dataframe

我有这个 df:

ID         Date   Time       Lat       Lon
 A  07/16/2019   08:00  29.39291 -98.50925
 A  07/16/2019   09:00  29.39923 -98.51256
 A  07/16/2019   10:00  29.40147 -98.51123
 A  07/18/2019   08:30  29.38752 -98.52372
 A  07/18/2019   09:30  29.39291 -98.50925
 B  07/16/2019   08:00  29.39537 -98.50402
 B  07/18/2019   11:00  29.39343 -98.49707
 B  07/18/2019   12:00  29.39291 -98.50925
 B  07/19/2019   10:00  29.39556 -98.53148

我想按 IDDate 对 df 进行分组,将行向后移动一步,并向前填充 NaN 值。

注意:(ID, Date)只有一行,应由行本身填充。

例如:B 07/16/2019 08:00 29.39537 -98.50402

预期结果:

ID         Date   Time       Lat       Lon Time.1     Lat.1     Lon.1
 A  07/16/2019   08:00  29.39291 -98.50925  09:00  29.39923 -98.51256
 A  07/16/2019   09:00  29.39923 -98.51256  10:00  29.40147 -98.51123
 A  07/16/2019   10:00  29.40147 -98.51123  10:00  29.40147 -98.51123
 A  07/18/2019   08:30  29.38752 -98.52372  09:30  29.39291 -98.50925
 A  07/18/2019   09:30  29.39291 -98.50925  09:30  29.39291 -98.50925
 B  07/16/2019   08:00  29.39537 -98.50402  08:00  29.39537 -98.50402
 B  07/18/2019   11:00  29.39343 -98.49707  12:00  29.39291 -98.50925
 B  07/18/2019   12:00  29.39291 -98.50925  12:00  29.39291 -98.50925
 B  07/19/2019   10:00  29.39556 -98.53148  10:00  29.39556 -98.53148

我正在使用的代码(未达到预期结果):

pd.concat([df, df.groupby(['ID','Date']).shift(-1).ffill()], axis=1)

最佳答案

如果原始数据中没有缺失值的解决方案 - 首先用原始值替换具有一个元素组的行,然后向前填充缺失值:

m = ~df.duplicated(['ID','Date']) & ~df.duplicated(['ID','Date'], keep=False)
df1 = df.groupby(['ID','Date']).shift(-1).mask(m, df).ffill()
df = pd.concat([df, df1.add_suffix('.1')], axis=1)
print (df)
  ID        Date   Time       Lat       Lon Time.1     Lat.1     Lon.1
0  A  07/16/2019  08:00  29.39291 -98.50925  09:00  29.39923 -98.51256
1  A  07/16/2019  09:00  29.39923 -98.51256  10:00  29.40147 -98.51123
2  A  07/16/2019  10:00  29.40147 -98.51123  10:00  29.40147 -98.51123
3  A  07/18/2019  08:30  29.38752 -98.52372  09:30  29.39291 -98.50925
4  A  07/18/2019  09:30  29.39291 -98.50925  09:30  29.39291 -98.50925
5  B  07/16/2019  08:00  29.39537 -98.50402  08:00  29.39537 -98.50402
6  B  07/18/2019  11:00  29.39343 -98.49707  12:00  29.39291 -98.50925
7  B  07/18/2019  12:00  29.39291 -98.50925  12:00  29.39291 -98.50925
8  B  07/19/2019  10:00  29.39556 -98.53148  10:00  29.39556 -98.53148

如果不需要自定义函数,则需要双groupby,因为每个组都需要前向填充:

df1 = df.groupby(['ID','Date']).shift(-1).groupby([df['ID'],df['Date']]).ffill().fillna(df)
df = pd.concat([df, df1.add_suffix('.1')], axis=1)
print (df)
  ID        Date   Time       Lat       Lon Time.1     Lat.1     Lon.1
0  A  07/16/2019  08:00  29.39291 -98.50925  09:00  29.39923 -98.51256
1  A  07/16/2019  09:00  29.39923 -98.51256  10:00  29.40147 -98.51123
2  A  07/16/2019  10:00  29.40147 -98.51123  10:00  29.40147 -98.51123
3  A  07/18/2019  08:30  29.38752 -98.52372  09:30  29.39291 -98.50925
4  A  07/18/2019  09:30  29.39291 -98.50925  09:30  29.39291 -98.50925
5  B  07/16/2019  08:00  29.39537 -98.50402  08:00  29.39537 -98.50402
6  B  07/18/2019  11:00  29.39343 -98.49707  12:00  29.39291 -98.50925
7  B  07/18/2019  12:00  29.39291 -98.50925  12:00  29.39291 -98.50925
8  B  07/19/2019  10:00  29.39556 -98.53148  10:00  29.39556 -98.53148

使用 lambda 函数应该是这样的解决方案:

c = ['Time','Lat','Lon']
df1 = df.groupby(['ID','Date'])[c].apply(lambda x: x.shift(-1).ffill()).fillna(df)
df = pd.concat([df, df1.add_suffix('.1')], axis=1)
print (df)
  ID        Date   Time       Lat       Lon Time.1     Lat.1     Lon.1
0  A  07/16/2019  08:00  29.39291 -98.50925  09:00  29.39923 -98.51256
1  A  07/16/2019  09:00  29.39923 -98.51256  10:00  29.40147 -98.51123
2  A  07/16/2019  10:00  29.40147 -98.51123  10:00  29.40147 -98.51123
3  A  07/18/2019  08:30  29.38752 -98.52372  09:30  29.39291 -98.50925
4  A  07/18/2019  09:30  29.39291 -98.50925  09:30  29.39291 -98.50925
5  B  07/16/2019  08:00  29.39537 -98.50402  08:00  29.39537 -98.50402
6  B  07/18/2019  11:00  29.39343 -98.49707  12:00  29.39291 -98.50925
7  B  07/18/2019  12:00  29.39291 -98.50925  12:00  29.39291 -98.50925
8  B  07/19/2019  10:00  29.39556 -98.53148  10:00  29.39556 -98.53148

关于python - Groupby、移位和前向填充,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59643503/

相关文章:

python - Pandas - 使用 PostCoder 查找每行中的纬度和经度,然后在新列中返回邮政编码

python - 如何在 python 中使用正则表达式 'skip' 特定单词?

python - 基于列表的 csv 对 pandas DataFrame 进行切片

r - 将可变长度数据存储在R data.frame中的最佳方法?

python - 适用于 Windows 的 AppEngine Python SDK 无法运行带有 EOFError 的应用程序

python - 如何使用在 docker 容器中运行的 python 脚本创建(dockerized)Elasticsearch 索引?

python - pandas:链式方法的组合,如 .resample()、.rolling() 等

python - 在 Pandas 中添加多列

python - 检查上一行值以将数据从一列复制到另一列

python - 在python中合并来自两个不同列表的DF