python - 简化pandas表达式

这不是一个好问题，但问题是这样的:
我正在做一个非常简单的计算，结合了 DataFrame 的几列，但前提是其中一列具有特定值。想法是:

if df.x==1:
    df.y = df.y - df.a/df.b

现在循环当然很慢，所以我在 DataFrame 的子集上执行此操作，但这很快就会变得非常冗长:

df.loc[df.x==1, 'y'] = df.loc[df.x==1, 'y'] - df.loc[df.x==1, 'a']/df.loc[df.x==1, 'b']

我感觉有更好的方法可以做到这一点，有什么想法吗？

最佳答案

正如 @EdChum 在他的评论中提到的，您可以直接使用 .loc 。您可以使用 -= 表示法进一步简化。

df = pd.DataFrame({'x': [1, 2, 3], 
                   'y': [1, 2, 3], 
                   'a': [1, 2, 3], 
                   'b': [2, 2, 2]})

>>>  df
   a  b  x  y
0  1  2  1  1
1  2  2  2  2
2  3  2  3  3

df.loc[df.x==1, 'y'] -= df.a / df.b

>>> df
   a  b  x    y
0  1  2  1  0.5
1  2  2  2  2.0
2  3  2  3  3.0

关于python - 简化pandas表达式，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/32381834/

上一篇：python - 如何在 Python 中创建用户定义的列表？

下一篇：python - 用于客户端 python 接口(interface)/raw_input 的 Twisted Reactor

相关文章：

python - Pandas groupby 和多列的加权和

Python glob 多种文件类型

python - SQLAlchemy插入，更新前提条件(一致性检查)

python - 子进程 check_output 上的路径

python - 将 Count 连接到 pandas 中的原始 DataFrame

python - pandas.algos._return_false 在 CentOS 上使用 dill.dump_session 导致 PicklingError

Python:为什么交换最大和最小数字的代码不起作用？

python - 列表索引超出范围 : Importing info from two lists into one conditionally

python - 当列未对齐时连接多个 Pandas 数据框

python - 如何使用每小时的值对 pandas 系列进行重新采样

python - 简化pan​​das表达式

上一篇：python - 如何在 Python 中创建用户定义的列表？

下一篇：python - 用于客户端 python 接口(interface)/raw_input 的 Twisted Reactor

python - 简化pandas表达式