我正在尝试“合并连接”两个 pandas DataFrame。基本上,我想堆叠两个 DataFrame,但只保留每个 DataFrame 中与另一个 DataFrame 中的值匹配的行。例如:
data1:
+---+------------+-----------+-------+
| | first_name | last_name | class |
+---+------------+-----------+-------+
| 0 | Alex | Anderson | 1 |
| 1 | Amy | Ackerman | 2 |
| 2 | Allen | Ali | 3 |
| 3 | Alice | Aoni | 4 |
| 4 | Andrew | Andrews | 4 |
| 5 | Ayoung | Atiches | 5 |
+---+------------+-----------+-------+
data2:
+---+------------+-----------+-------+
| | first_name | last_name | class |
+---+------------+-----------+-------+
| 0 | Billy | Bonder | 4 |
| 1 | Brian | Black | 5 |
| 2 | Bran | Balwner | 6 |
| 3 | Bryce | Brice | 7 |
| 4 | Betty | Btisan | 8 |
| 5 | Bruce | Bronson | 8 |
+---+------------+-----------+-------+
然后在 data1
和 data2
上执行此操作后生成的数据帧应如下所示:
result:
+---+------------+-----------+-------+
| | first_name | last_name | class |
+---+------------+-----------+-------+
| 3 | Alice | Aoni | 4 |
| 4 | Andrew | Andrews | 4 |
| 5 | Ayoung | Atiches | 5 |
| 0 | Billy | Bonder | 4 |
| 1 | Brian | Black | 5 |
+---+------------+-----------+-------+
基本上,我试图合并两个数据集,然后堆叠列。我可以想出几种方法来做到这一点,但它们都是 hack-y。我可以合并 data1
和 data2
然后堆叠列,或者使用像这样的 map :
map1 = data1['subject_id'].map(lambda x: x in list(data2['subject_id']))
map2 = data2['subject_id'].map(lambda x: x in list(data1['subject_id']))
pd.concat([data1[map1], data2[map2]])
但是有没有更优雅的解决方案呢?
最佳答案
这个怎么样?
In [335]: cls = np.intersect1d(data1['class'], data2['class'])
In [336]: cls
Out[336]: array([4, 5], dtype=int64)
In [337]: pd.concat([data1.ix[data1['class'].isin(cls)], data2.ix[data2['class'].isin(cls)]])
Out[337]:
first_name last_name class
3 Alice Aoni 4
4 Andrew Andrews 4
5 Ayoung Atiches 5
0 Billy Bonder 4
1 Brian Black 5
或:
In [338]: data1.ix[data1['class'].isin(cls)].append(data2.ix[data2['class'].isin(cls)])
Out[338]:
first_name last_name class
3 Alice Aoni 4
4 Andrew Andrews 4
5 Ayoung Atiches 5
0 Billy Bonder 4
1 Brian Black 5
关于python - 连接 pandas DataFrames 只保留列中具有匹配值的行?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40072950/