当满足时间序列的条件时，Python Pandas 会聚合过去的行

标签 python pandas

我有一个时间序列问题，我想根据某个列中出现的值聚合一些数据。为了说明，请考虑下表

日期
可乐
列
结肠癌

2019-01-01
1
-10
空值

2019-01-02
2
-5
空值

2019-01-03
3
0
101

2019-01-04
4
5
101

2019-01-05
5
10
101

2019-01-06
6
15
空值

2019-01-07
7
20
101

我想完成以下任务:

当 colC 的值不为空时，将值聚合到该行并获取日期列的增量

如果 colC 的元素 X 不为空但元素 (X-1) 也不为空，则忽略 X 行。

对于上一个表，结果将是

聚合(colC)
平均(colA)
平均(colB)
delta(Date) [以天为单位]

101
2
-5
2

101
6.5
17.5
1

到目前为止我找不到任何方法来完成

最佳答案

试试 groupby :

#convert Date column to datetime if needed
df["Date"] = pd.to_datetime(df["Date"])

#keep only rows where there aren't consecutive non-null values
df2 = df[~(df["colC"].notnull()&df["colC"].shift().notnull())]

#groupby consecutive null values and aggregate
output = df2.groupby(df2["colC"].notnull().shift().cumsum().fillna(0)) \
            .agg({"colA": "mean", \
                  "colB": "mean", \
                  "colC": "first", \
                  "Date": lambda x: (x.max()-x.min()).days}) \
            .rename_axis(None) \
            .rename(columns={"Date": "Delta"})

>>> output
     colA  colB   colC  Delta
0.0   2.0  -5.0  101.0      2
1.0   6.5  17.5  101.0      1

关于当满足时间序列的条件时，Python Pandas 会聚合过去的行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/69290348/

上一篇：struct - 为什么 NamedTuples 和(不可变的)结构是分开的？

下一篇：haskell - 有没有更简单的方法来编写这个函数并且只使用 Haskell 的前奏？

python - 导入tensorflow时出现错误 'ImportError: DLL load failed: The specified module could not be found.'

python - 随机森林树生长算法

Python C API : why doesn't PyRun_String evaluate simple conditional expressions?

python - 在 Pandas 中有效地标记变量值

pandas - 将 pandas 列表转换为虚拟变量

python - 将两个数组合并为数组列表

python - 有没有办法在 MultiIndex 列上应用函数？

python - 如何将 pandas DataFrame 保存到 excel 文件？

python - 使用 loc 在 pandas 数据框中设置值 - 多个选择条件允许在不同列中设置值