python - 使用另一个数据帧中的值对对数据帧进行子化

我有一个名为 In CorrectQuestions_df 的 DataFrame，如下所示:

这些问题对可以在此 DataFrame 中重复很多很多次。此 DataFrame 中总共约有 5000 万条记录。

我使用现有的 DataFrame 创建另一个 DataFrame，用于计算问题对的总数。这是我使用的代码:

IncorrectQuestions_count = pd.DataFrame(IncorrectQuestions_df.groupby(['Question1', 'Question2'])
                                    .size()
                                    ,columns=['Count'])

现在我只想保留至少出现 5 次的对，因此我使用以下代码进行子集化:

IncorrectQuestions_count = IncorrectQuestions_count[IncorrectQuestions_count['Count'] >= 5]
IncorrectQuestions_count.reset_index(inplace=True)

这给了我以下内容:

In CorrectQuestions_count 大约有 80,000 对。我想对 In CorrectQuestion_df 进行子集化，以仅包含 In CorrectQuestions_count 中存在的对。如果我编写 2 个 for 循环来执行此操作，将需要大量时间才能完成，所以我想知道是否有更 Pythonic 的方法来实现此目的？

如有任何指点，我们将不胜感激。

TIA。

最佳答案

您可以通过合并操作来完成此操作。

df=pd.DataFrame({'q1':randint(0,3,26),'q2':randint(0,3,26),
'state':[chr(i+ord('a')) for i in range(26)]})

cnts=df.groupby(['q1','q2']).count()>4

df['ok']=df.merge(cnts,left_on=cnts.index.names,right_index=True).state_y
result=df[df.ok]

对于这个例子:

    q1  q2   state      ok
0    2   1       a    True
2    2   1       c    True
9    2   1       j    True
22   2   1       w    True
24   2   1       y    True

关于python - 使用另一个数据帧中的值对对数据帧进行子化，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42541603/

python - 使用另一个数据帧中的值对对数据帧进行子化

上一篇：python - 如何从网页中提取单个元素？

下一篇：python - 使用 Pynsist 时出现 KeyError