python - 数据框 pandas 中的条件(NaN 列)和条件

标签 python pandas numpy

我想计算数据集中服务器停止的时间长度。 我知道停机时间,但不知道持续时间。

我有这个 df:

index                   a          b     c     reboot   stop
2018-06-25 12:49:00    NaN        NaN   NaN     0         1
2018-06-25 12:50:00    NaN        NaN   NaN     0         1
2018-06-25 12:51:00    NaN        NaN   NaN     1         1
2018-06-25 12:52:00    NaN        NaN   NaN     0         1
2018-06-25 12:53:00    NaN        NaN   NaN     0         1
2018-06-25 12:54:00    NaN        NaN   NaN     0         1
2018-06-25 12:55:00    NaN        NaN   NaN     0         1
2018-06-25 12:56:00    NaN        NaN   1.2      0         0
2018-06-25 12:57:00    NaN        NaN   NaN     0         1
2018-06-25 12:58:00    NaN        NaN   NaN     1         1
2018-06-25 12:59:00    NaN        NaN   NaN     0         1
2018-06-25 13:00:00    NaN        NaN   NaN     0         1
2018-06-25 13:01:00    NaN        NaN   NaN     0         0

如果a、b、c = NaN, 我的服务器在 reboot, stop = 1 时停止 并在reboot, stop = 0时启动。

期望的输出:

index                        period
2018-06-25 12:51:00             5
2018-06-25 12:58:00             3

最佳答案

这将实现你想要的:

# Create a new column which identifies stopped times
df['stopped'] = np.nan
idx_stopped = (pd.isnull(df.a)) & (pd.isnull(df.b)) & (pd.isnull(df.c)) & (df.reboot == 1) & (df.stop == 1)
df.loc[idx_stopped, 'stopped'] = 1
df.loc[(df.reboot == 0) & (df.stop == 0), 'stopped'] = 0
df.stopped = df.stopped.ffill()
df.stopped = df.stopped.fillna(0)
df.loc[df.stopped == 0, 'stopped'] = np.nan

# Count the number of periods for each stop instance
v = df.stopped[::-1]
cumsum = v.cumsum().fillna(method='pad')
reset = -cumsum[v.isnull()].diff().fillna(cumsum)
result = v.where(v.notnull(), reset).cumsum()
df['period'] = result[::-1]

# Identify the time each stop incident began
df['first'] = (df.stopped == 1) & (pd.isnull(df.stopped.shift(1)))
df2 = df[['index', 'period']][df['first']]
df2.period = df2.period.astype(int)

print(df2)
                 index  period
2  2018-06-25 12:51:00       5
9  2018-06-25 12:58:00       3

关于python - 数据框 pandas 中的条件(NaN 列)和条件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55462233/

相关文章:

Python/Numpy 属性错误 : 'float' object has no attribute 'sin'

python - Numpy 将图像连接成数组

python - 符号条件和

python - 添加新行并保留索引 python

python - 使边从 Networkx 中的节点外部开始

python - 将所有 nlp 实体提取到另一列

python - 取决于计算 groupby 对象中两个列单元格之间的差异的列

python - DataFrame 每隔第三行并向前填充

python chaco轴标签时间格式

python - 重复键值违反唯一约束 - 尝试从 dask 数据帧创建 sql 表时出现 postgres 错误