python - 具有 lambda 函数的 Pandas .filter() 方法

<分区>

我正在尝试理解 .filter() Pandas 中的方法。我不确定为什么下面的代码不起作用:

# Load data
from sklearn.datasets import load_iris
import pandas as pd
data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)

# Set arbitrary index (is this needed?) and try filtering:
indexed_df = df.copy().set_index('sepal width (cm)')
test = indexed_df.filter(lambda x: x['petal length (cm)'] > 1.4)

我得到:

TypeError: 'function' object is not iterable

我很欣赏有更简单的方法来做到这一点(例如 bool 索引)，但出于学习目的，我试图理解为什么 filter 在适用于 groupby 时会在这里失败> 如下图:

这个有效:

 filtered_df = df.groupby('petal width (cm)').filter(lambda x: x['sepal width (cm)'].sum() > 50)

最佳答案

可以使用条件indexed_df['petal length (cm)'] > 1.4(这里我们使用indexed_df，而不是x)作为过滤数据框的一种方式，所以:

indexed_df[indexed_df['petal length (cm)'] > 1.4]

这是如何运作的？

如果您执行 indexed_df['petal length (cm)']，您将获得数据框的“列”:某种序列，其中对于每个索引，我们获取该列的值。通过执行 column > 1.4，我们获得了某种 bool 值列:True 如果满足特定行的条件，False否则。

然后我们可以使用这样的 bool 列作为数据框的元素 indexed_df[boolean_column] 以仅获取 boolean_column 的对应行是 的行是的。

关于python - 具有 lambda 函数的 Pandas .filter() 方法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48304854/

上一篇：python - 尝试部署我的模型时 Sagemaker "Could not find model data"

下一篇：python - 设计 REST API 架构和实现

相关文章：

python - 将重复行转换为独立列

python - Pandas 数据框打印额外信息

python - 对 Pandas Series 的数据进行排序，然后按字母顺序优雅地按索引排序

python - 使用 matplotlib 绘制分段函数会导致 ValueError : The truth value of an array with more than one element is ambiguous

Python 2.6 : os. rename() 或 os.renames() 报告 OSError 但文件名为 None

python - 从单词中删除数字

python - Scikit-Learn 的 Pipeline : A sparse matrix was passed, 但需要密集数据

python - 两个 Pandas 列的字符串连接

python & maven (单元测试集成)

python - Python 函数调用超时