python - 想要知道有多少对象位于两个不同子集的重叠部分

标签 python pandas numpy if-statement multidimensional-array

我有一个具有某些特征的类别(高度和体重,由 np.where 定义)和一个具有其他特征的不同类别(某人是否是双胞胎以及有多少 sibling ,由 np.where 定义)。我想看看有多少人同时属于这两个类别(就像制作维恩图时有多少人位于中心?)。

我正在导入 CSV 文件的列。 该表如下所示:

    Child  Inches  Weight Twin  Siblings
0     A      53     100    Y         3
1     B      54     110    N         4
2     C      56     120    Y         2
3     D      58     165    Y         1
4     E      60     150    N         1
5     F      62     160    N         1
6     H      65     165    N         3
import pandas as pd
import numpy as np

file = pd.read_csv(r'~/Downloads/Test3 CVS_Sheet1.csv')
#%%
height = file["Inches"]
weight = file["Weight"]
twin = file["Twin"]
siblings = file["Siblings"]
#%%
area1 = np.where((height <= 60) & (weight <= 150))[0]
#%%
#has two or more siblings (and is a twin)
group_a = np.where((siblings >= 2) & (twin == 'Y'))[0]

#has two or more siblings (and is not a twin)
group_b = np.where((siblings >= 2) & (twin == 'N'))[0]

#has only one sibling (and is twin)
group_c = np.where((siblings == 1) & (twin == 'Y'))[0]

#has only one sibling (and is not a twin)
group_d = np.where((siblings == 1) & (twin == 'N'))[0]
#%%
for i in area1:
    if group_a==True:
        print("in area1 there are", len(i), "children in group_a")
    elif group_b==True:
        print("in area1 there are", len(i), "children in group_b")  
    elif group_c==True:
        print("in area1 there are", len(i), "children in group_c")
    elif group_d==True:
        print("in area1 there are", len(i), "children in group_d")

我收到错误:“ValueError:具有多个元素的数组的真值不明确。使用 a.any() 或 a.all()”

我希望得到如下输出:

"in area1 there are 2 children in group_a"
"in area1 there are 1 children in group_b"
"in area1 there are 0 children in group_c"
"in area1 there are 1 children in group_d"

提前致谢!

最佳答案

在您的示例中,我会采用略有不同的设计。你可以这样做:

df['area1'] = np.where((df.Inches <= 60) & (df.Weight <= 150),1,0)
df['group_a'] = np.where((df.Siblings >= 2) & (df.Twin == 'Y'),1,0)
df['group_b'] = np.where((df.Siblings >= 2) & (df.Twin == 'N'),1,0)
df['group_c'] = np.where((df.Siblings == 1) & (df.Twin == 'Y'),1,0)
df['group_d'] = np.where((df.Siblings == 1) & (df.Twin == 'N'),1,0)

结果将如下所示:

enter image description here

从现在起,您可以构建查询,以便查看 group_b,您可以执行以下操作:

df.groupby(['area1'])['group_b'].sum()[1]

您将获得想要的结果: 1. 您可以使用总和或计数来调整您的表格。

最后:

for col in df.columns[6:]:
   r = df.groupby(['area1'])[col].sum()[1]
   print ("in area1 there are",r,'children in',col)

会产生:

in area1 there are 2 children in group_a
in area1 there are 1 children in group_b
in area1 there are 0 children in group_c
in area1 there are 1 children in group_d

关于python - 想要知道有多少对象位于两个不同子集的重叠部分,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57065716/

相关文章:

Python numpy 错误或功能

python - 在 python PyNaCl 中从数据库中检索加密 key ,我如何转换回 PublicKey 或 PrivateKey 对象?

python - 使用 Pelican 生成静态 HTML 之前运行自定义脚本

python - 使用混淆矩阵了解多标签分类器

python - Cython Memoryview 段错误

python - 在 Pandas 聚合函数中创建多列

python - 如何为每个组的列中的顺序分配一个值基础?

python - 根据pandas中的条件更改数据框的所有值

python - ndim 在 numpy 中的工作

python - 为什么 Cython 期望 0 维?