python - For 循环子集化的 pandas 数据帧

我正在尝试迭代创建数据帧的子集。一个玩具示例:

In:

   A  B  participant  
0  1  3            1          
1  2  4            1         
2  5  8            2          
3  4  9            2
4  3  7            3

(条件语句感谢下面的评论者)

for p in df:
    subset = df[df['participant'] == p].loc[: , 'A']

期望的结果是:

   A  participant  
0  1            1          
1  2            1

   A  participant  
0  5            2          
1  4            2

等等

但是 for 循环按行生成子集，而不是按参与者生成子集。如何获取每个参与者的子集？

最初的尝试:

for p in df:
    p.pressure = df[(:, 'pressure') & (df['participant'] == p)]

最佳答案

这是一种方法。

首先获取参与者的唯一值:

participants = df['participant'].unique()
#array([1, 2, 3])

现在为每个参与者创建一个数据框。在此示例中，我将每个 DF 存储在一个字典中，并以参与者编号作为键控。

output_dfs = {p: df[df['participant'] == p] for p in participants}
for p in output_dfs:
    print("Participant = %s"%p)
    print(output_dfs[p])
    print("")

打印内容:

Participant = 1
   A  B  participant
0  1  3            1
1  2  4            1

Participant = 2
   A  B  participant
2  5  8            2
3  4  9            2

Participant = 3
   A  B  participant
4  3  7            3

关于python - For 循环子集化的 pandas 数据帧，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49242992/

上一篇：python - 使用 pandas 将多列缩减为一列

下一篇：python - 如何在 tensorflow 中复制 numpy.choose() ？

python - 自定义格式为 JSON

python - scikit-learn - HashingVectorizer 上的 Tfidf

Python pandas 替换字符串

Python/Pandas 遍历列

c - 如何在C中获取数组中两个索引的总和？

python - Django SQlite 配置

python - Pandas 的 to_datetime 函数不会改变 dtype

python - 我可以在数组中不重复地打印值吗

R 编程，使用自定义脚本(针对每个 i)进行逐行数据帧计算以解决 "bridge game"