python - 使用 Pandas 合并三个或更多数据框

标签 python pandas

在 Pandas 合并函数中,您可以合并两个数据框,但我需要合并 N 个数据框,类似于在完整外连接中合并 N 个表的 SQL 语句。例如,我需要通过 'type_1'、'subject_id_1''type_2'、'subject_id_2''type_3' 合并下面的三个数据框, “subject_id_3”。这可能吗?

import pandas as pd

raw_data = {
        'type_1': [1, 1, 0, 0, 1],
        'subject_id_1': ['1', '2', '3', '4', '5'],
        'first_name_1': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung']}
df_a = pd.DataFrame(raw_data, columns = ['type_1', 'subject_id_1', 'first_name_1'])

raw_datab = {
        'type_2': [1, 1, 0, 0, 0],
        'subject_id_2': ['4', '5', '6', '7', '8'],
        'first_name_2': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty']}
df_b = pd.DataFrame(raw_datab, columns = ['type_2', 'subject_id_2', 'first_name_2'])

raw_datac = {
        'type_3': [1, 1],
        'subject_id_3': ['4', '5'],
        'first_name_3': ['Joe', 'Paul']}
df_c = pd.DataFrame(raw_datac, columns = ['type_3', 'subject_id_3', 'first_name_3'])

### need to include here the third data frame
merged = pd.merge(df_a, df_b, left_on=['type_1','subject_id_1'], 
                 right_on = ['type_2','subject_id_2'], how='outer')  
print(merged)

注意:每个数据框中要连接的字段名称不同。

最佳答案

我相信需要加入 set_index 创建的索引与 concat :

dfs = [df_a.set_index(['type_1','subject_id_1']),
       df_b.set_index(['type_2','subject_id_2']),
       df_c.set_index(['type_3','subject_id_3'])]

df = pd.concat(dfs, axis=1)
print (df)
    first_name_1 first_name_2 first_name_3
0 3        Allen          NaN          NaN
  4        Alice          NaN          NaN
  6          NaN         Bran          NaN
  7          NaN        Bryce          NaN
  8          NaN        Betty          NaN
1 1         Alex          NaN          NaN
  2          Amy          NaN          NaN
  4          NaN        Billy          Joe
  5       Ayoung        Brian         Paul

df = pd.concat(dfs, axis=1).rename_axis(('type','subject_id')).reset_index()
print (df)
   type subject_id first_name_1 first_name_2 first_name_3
0     0          3        Allen          NaN          NaN
1     0          4        Alice          NaN          NaN
2     0          6          NaN         Bran          NaN
3     0          7          NaN        Bryce          NaN
4     0          8          NaN        Betty          NaN
5     1          1         Alex          NaN          NaN
6     1          2          Amy          NaN          NaN
7     1          4          NaN        Billy          Joe
8     1          5       Ayoung        Brian         Paul

关于python - 使用 Pandas 合并三个或更多数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49750788/

相关文章:

python - 将标签列表作为 drop() 方法的 'labels' 参数传递时,出现 ValueError : Need to specify at least one of 'index' , 'columns' 或 'columns'

Python:如何使用从列计算的乘法字段扩展 DataFrame

python - 如何在 Python API 中使用 plotly 在 x 轴范围中位位置绘制垂直线?

python - 在大数据集的 pandas 数据框中搜索和替换

python - 在 Pandas 中使用 groupby 和 resample 时如何显示日期时间对象?

python - 为什么 pyautogui 点击实际上没有点击

python - 指定自定义假期的 Pandas

python - 读取包含制表符的行

python - pycrypto 和谷歌应用引擎

python - 如何检查Pyspark Map中是否存在键或值