python - 连接 pandas DataFrames 只保留列中具有匹配值的行?

标签 python pandas dataframe

我正在尝试“合并连接”两个 pandas DataFrame。基本上,我想堆叠两个 DataFrame,但只保留每个 DataFrame 中与另一个 DataFrame 中的值匹配的行。例如:

data1:

+---+------------+-----------+-------+
|   | first_name | last_name | class |
+---+------------+-----------+-------+
| 0 | Alex       | Anderson  |     1 |
| 1 | Amy        | Ackerman  |     2 |
| 2 | Allen      | Ali       |     3 |
| 3 | Alice      | Aoni      |     4 |
| 4 | Andrew     | Andrews   |     4 |
| 5 | Ayoung     | Atiches   |     5 |
+---+------------+-----------+-------+

data2:

+---+------------+-----------+-------+
|   | first_name | last_name | class |
+---+------------+-----------+-------+
| 0 | Billy      | Bonder    |     4 |
| 1 | Brian      | Black     |     5 |
| 2 | Bran       | Balwner   |     6 |
| 3 | Bryce      | Brice     |     7 |
| 4 | Betty      | Btisan    |     8 |
| 5 | Bruce      | Bronson   |     8 |
+---+------------+-----------+-------+

然后在 data1data2 上执行此操作后生成的数据帧应如下所示:

result:

+---+------------+-----------+-------+
|   | first_name | last_name | class |
+---+------------+-----------+-------+
| 3 | Alice      | Aoni      |     4 |
| 4 | Andrew     | Andrews   |     4 |
| 5 | Ayoung     | Atiches   |     5 |
| 0 | Billy      | Bonder    |     4 |
| 1 | Brian      | Black     |     5 |
+---+------------+-----------+-------+

基本上,我试图合并两个数据集,然后堆叠列。我可以想出几种方法来做到这一点,但它们都是 hack-y。我可以合并 data1data2 然后堆叠列,或者使用像这样的 map :

map1 = data1['subject_id'].map(lambda x: x in list(data2['subject_id']))
map2 = data2['subject_id'].map(lambda x: x in list(data1['subject_id']))
pd.concat([data1[map1], data2[map2]])

但是有没有更优雅的解决方案呢?

最佳答案

这个怎么样?

In [335]: cls = np.intersect1d(data1['class'], data2['class'])

In [336]: cls
Out[336]: array([4, 5], dtype=int64)

In [337]: pd.concat([data1.ix[data1['class'].isin(cls)], data2.ix[data2['class'].isin(cls)]])
Out[337]:
  first_name last_name  class
3      Alice      Aoni      4
4     Andrew   Andrews      4
5     Ayoung   Atiches      5
0      Billy    Bonder      4
1      Brian     Black      5

或:

In [338]: data1.ix[data1['class'].isin(cls)].append(data2.ix[data2['class'].isin(cls)])
Out[338]:
  first_name last_name  class
3      Alice      Aoni      4
4     Andrew   Andrews      4
5     Ayoung   Atiches      5
0      Billy    Bonder      4
1      Brian     Black      5

关于python - 连接 pandas DataFrames 只保留列中具有匹配值的行?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40072950/

相关文章:

python - 如何将 pandas 多索引数据框绘制为 3d

python - Pandas 连接 : cannot reindex from a duplicate axis

python - 如何通过字符串包含合并基于两列的 2 个数据框

python - jinja2:在选择标签内对数据进行分组

python - Pandas 数据框 : How to parse integers into string of 0s and 1s?

python - 在 Python 回归样条中选择节点

r - 识别 R 数据框列中的数字或字符序列

python - 根据动态条件选择行

Python - 更改列名、合并和重新排序数据框

python - 查找 Pandas 中最长列的长度