python - Pandas 数据帧 : Get value pairs from subsets of dataframe

我有一个 df:

df = pd.DataFrame({'id': [1, 1, 2, 2, 2, 3, 4, 4, 4], \
                    "name": ["call", "response", "call", "call", "response", "call", "call", "response", "response"]})

    id  name
0   1   call
1   1   response
2   2   call
3   2   call
4   2   response
5   3   call
6   4   call
7   4   response
8   4   response

我正在尝试提取一个调用-响应对，其中调用后的第一个响应是正确的模式。调用和响应对位于其自己的 id 子集中，如下所示:

    id  name
0   1   call
1   1   response
3   2   call
4   2   response
6   4   call
7   4   response

理想情况下，我会将索引保留在数据框中，以便稍后可以将df.loc与索引一起使用。

我尝试过的是在子集中遍历df并应用某些内容或使用滚动窗口。但只成功得到错误。

unique_ids = df.id.unique()

for unique_id in unique_ids :
    df.query('id== @unique_id').apply(something))

我还没有发现可以专门用于数据帧的子集的东西

最佳答案

使用DataFrameGroupBy.shift与 Series.eq 进行比较用于检查 boolean indexing 中的相等性和过滤器:

m1 = df['name'].eq('call') & df.groupby('id')['name'].shift(-1).eq('response')
m2 = df['name'].eq('response') & df.groupby('id')['name'].shift().eq('call')
df2 = df[m1 | m2]

print (df2)
   id      name
0   1      call
1   1  response
3   2      call
4   2  response
6   4      call
7   4  response

关于python - Pandas 数据帧 : Get value pairs from subsets of dataframe，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/67487367/

上一篇：.net - 使用.NET 5 CancellationToken调用可以在主线程上超时的方法

下一篇：prolog - 同时访问两个或多个可回溯谓词的解决方案

相关文章：

python - 创建Python类 "vars-able"

python - Pandas 中 dataframe.loc() 的 Numpy 等价性是什么

python - groupby 内的条件前向填充

python - 将多个相似产品合并为一个产品并在 Pandas 数据框中显示合并产品的总和

python - 将 DataFrame 作为一行 append 到更大的 DataFrame

python - 对 df 进行排序以获得两列(一个多列)的最高行值，但保持多列基本顺序

Python - 根据 JSON 中的值从 JSON 中删除重复元素

python - Django 中的自动更新缓存

python - 为什么我的 clojure shell 结果与 python 中的结果不同？

python - 将包含特定字符串的行值移动到 Python 中的新列