python - 如何根据数据框和 numpy 中的协变量对观察结果进行分类？

我有一个包含 n 个观测值的数据集，假设有 2 个变量 X1 和 X2。我试图根据 (X1, X2) 值的一组条件对每个观察结果进行分类。例如，数据集看起来像

df:
Index     X1    X2
1         0.2   0.8
2         0.6   0.2
3         0.2   0.1
4         0.9   0.3

组由

定义

第 1 组:X1<0.5 & X2>=0.5
第 2 组:X1>=0.5 & X2>=0.5
第 3 组:X1<0.5 & X2<0.5
第 4 组:X1>=0.5 & X2<0.5

我想生成以下数据框。

expected result:
Index     X1    X2    Group
1         0.2   0.8   1
2         0.6   0.2   4
3         0.2   0.1   3
4         0.9   0.3   4

此外，对于此类问题，使用 numpy 数组会更好/更快吗？

最佳答案

在回答你的最后一个问题时，我绝对认为 pandas 是一个很好的工具；它可以在 numpy 中完成，但 pandas 在处理数据帧时可以说更直观，并且对于大多数应用程序来说足够快。 pandas 和 numpy 也能很好地协同工作。例如，在您的情况下，您可以使用 numpy.select 构建您的 pandas 列:

import numpy as np
import pandas as pd
# Lay out your conditions
conditions =  [((df.X1 < 0.5) & (df.X2>=0.5)),
               ((df.X1>=0.5) & (df.X2>=0.5)),
               ((df.X1<0.5) & (df.X2<0.5)),
               ((df.X1>=0.5) & (df.X2<0.5))]

# Name the resulting groups (in the same order as the conditions)
choicelist = [1,2,3,4]

df['group']= np.select(conditions, choicelist, default=-1)

# Above, I've the default to -1, but change as you see fit
# if none of your conditions are met, then it that row would be classified as -1

>>> df
   Index   X1   X2  group
0      1  0.2  0.8      1
1      2  0.6  0.2      4
2      3  0.2  0.1      3
3      4  0.9  0.3      4

关于python - 如何根据数据框和 numpy 中的协变量对观察结果进行分类？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49061921/

上一篇：使用 sqlalchemy 的 python flask_reSTLess 不生成 api 端点并使用蓝图给出 "has no attribute extensions"错误

下一篇：python - 在 Pandas 中加入 dfs 时出现关键错误

相关文章：

python - python 中的按键绑定(bind)

Python Cookbook 适用于 Python 2.4

python - 如何使用 Pandas 获取一列中的唯一值，同时对另一列中的某些值求和？

python - concat() 得到一个意外的关键字参数 'join_axes'

python - 如何使用 scipy.io.savemat 附加到 .mat 文件？

python - NumPy 中的一维数组

python - Django - 表单文件字段错误 "This field is required"

python - 如何使用模块 re 从数据帧的列中删除特殊字符？

python - 使用 Pandas 循环根据两列中的条件组合创建新的数据框

python - Python List List 切片列表