python - 如何根据设定的条件转发pandas数据框中的填充非空值

假设我有以下数据框:

df = pd.DataFrame({'a':[0,0,0,1,0,0], 'b':[0,0,1,0,0,0], 'c':[0,1,1,0,0,0]})
df.index = pd.date_range('2000-03-02', periods=6, freq='D')

看起来像这样:

            a  b  c
2000-03-02  0  0  0
2000-03-03  0  0  1
2000-03-04  0  1  1
2000-03-05  1  0  0
2000-03-06  0  0  0
2000-03-07  0  0  0

现在我想将给定列中最后一个 1 之后出现的每个值设置为 2。所需的结果如下所示:

            a  b  c
2000-03-02  0  0  0
2000-03-03  0  0  1
2000-03-04  0  1  1
2000-03-05  2  2  2
2000-03-06  2  2  2
2000-03-07  2  2  2

我有这段代码，它可以工作:

cols = df.columns
for col in cols:
    s = df[col]
    x = s[s==1].index[-1]
    df[col][(x + 1):] = 2

但这看起来相当尴尬，并且违背了 pandas 的精神(非 Pandonic？)。有更好的方法建议吗？

最佳答案

一种方法是replace带有 NaN 的较低零:

In [11]: df.replace(0, np.nan).bfill()  # maybe neater way to do this?
Out[11]:
             a   b   c
2000-03-02   1   1   1
2000-03-03   1   1   1
2000-03-04   1   1   1
2000-03-05   1 NaN NaN
2000-03-06 NaN NaN NaN
2000-03-07 NaN NaN NaN

现在您可以使用where将它们更改为 2:

In [12]: df.where(df.replace(0, np.nan).bfill(), 2)
Out[12]:
            a  b  c
2000-03-02  0  0  0
2000-03-03  0  0  1
2000-03-04  0  1  1
2000-03-05  1  2  2
2000-03-06  2  2  2
2000-03-07  2  2  2

编辑:在这里使用 cumsum 的技巧可能会更快:

In [21]: %timeit df.where(df.replace(0, np.nan).bfill(), 2)
100 loops, best of 3: 2.34 ms per loop

In [22]: %timeit df.where(df[::-1].cumsum()[::-1], 2)
1000 loops, best of 3: 1.7 ms per loop

In [23]: %timeit pd.DataFrame(np.where(np.cumsum(df.values[::-1], 0)[::-1], df.values, 2), df.index)
10000 loops, best of 3: 186 µs per loop

关于python - 如何根据设定的条件转发pandas数据框中的填充非空值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/22083378/

上一篇：python - 我怎样才能用pyqt在qtableview中显示一个矩阵

下一篇：python - 使用 Python，当同级元素的标签是我要查找的字符串时，如何获取 XML 元素的文本？

相关文章：

python - while之后这个函数还在执行吗？

python - 我如何将该表的 'valid' 转换为 pandas 中的日期和时间？

python - pandas to_json - 以天而不是秒为单位返回时间戳

python - 具有特定颜色和图例位置的 Pandas 条形图？

pandas - 整数类型上基于 iLocation 的 bool 索引不可用

python - Raspi 3 PIR 传感器 - Python 脚本 - 语法无效

python - 使用 Pandas 中两列之间的差异创建一个新的数据框

python - 当信号被捕获时，如何强制 PDB 退出？

python - 处理 numpy 数组规范化中的零

python - 带有列表的列中按字符串的 pandas df 子集