python - Pandas 数据框 : get column item when the corresponding item in another column is greater than a value

我有以下 Pandas 数据框。它是超过 50 万行的大型数据框。

    Event_Number  Well  p_and_s
0              1     7      4.0
1              1     9      0.0
2              1    15      0.0
3              2     7      2.0
4              2     9      7.0
5              2    15      0.0
6              3     5      0.0
7              3     7      8.0
8              3    16      3.0
9              4     7      8.0
10             4    15      0.0
11             5     7      8.0
12             5     9      3.0
13             5    15      6.0
14             6     5      0.0
15             6     7      8.0
16             7     7      8.0
17             7     9      0.0
18             7    15      0.0
19             8     7      8.0
20             8    15      4.0

我想为每组 [column: Event_Number] 找到什么 [column: Well] 在 [p_and_s] 列中具有大于 2 的值。

最终的 dataFrame 应该看起来像这样，新列列出了所有 p_and_s 大于 2 的列

        Event_Number  Well  p_and_s  well_array
0                 1     7      4.0  [7]
1                 1     9      0.0  [7]
2                 1    15      0.0  [7]
3                 2     7      2.0  [9]
4                 2     9      7.0  [9]
5                 2    15      0.0  [9]
6                 3     5      0.0  [7, 16]
7                 3     7      8.0  [7, 16]
8                 3    16      3.0  [7, 16]
9                 4     7      8.0  [7]
10                4    15      0.0  [7]
11                5     7      8.0  [7, 9, 15]
12                5     9      3.0  [7, 9, 15]
13                5    15      6.0  [7, 9, 15]
14                6     5      0.0  [7]
15                6     7      8.0  [7]
16                7     7      8.0  [7]
17                7     9      0.0  [7]
18                7    15      0.0  [7]
19                8     7      8.0  [7, 15]
20                8    15      4.0  [7, 15]

最佳答案

这是一种方法。

s = df[df['p_and_s'] > 2].groupby('Event_Number')['Well'].apply(list)
df['well_array'] = df['Event_Number'].map(s)

解释

在 p_and_s 上应用过滤器后，创建一个将 Event_Number 映射到 Well 的系列。
通过 pd.Series.map 映射到原始数据框。
为了提高性能，应尽可能避免 lambda 函数，因为它们代表昂贵的隐式循环。

结果

    Event_Number  Well  p_and_s  well_array
0              1     7      4.0         [7]
1              1     9      0.0         [7]
2              1    15      0.0         [7]
3              2     7      2.0         [9]
4              2     9      7.0         [9]
5              2    15      0.0         [9]
6              3     5      0.0     [7, 16]
7              3     7      8.0     [7, 16]
8              3    16      3.0     [7, 16]
9              4     7      8.0         [7]
10             4    15      0.0         [7]
11             5     7      8.0  [7, 9, 15]
12             5     9      3.0  [7, 9, 15]
13             5    15      6.0  [7, 9, 15]
14             6     5      0.0         [7]
15             6     7      8.0         [7]
16             7     7      8.0         [7]
17             7     9      0.0         [7]
18             7    15      0.0         [7]
19             8     7      8.0     [7, 15]
20             8    15      4.0     [7, 15]

关于python - Pandas 数据框 : get column item when the corresponding item in another column is greater than a value，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49970716/

python - Pandas 数据框 : get column item when the corresponding item in another column is greater than a value

上一篇：python - 从 Google Cloud ML-Engine 上部署的 SCIKITLEARN 模型进行预测

下一篇：python - 隐藏状态与 Tensorflow 的 dynamic_rnn 返回的最终状态