Python Pandas - 问题附加/连接两个多索引数据帧

标签 python pandas dataframe

我正在尝试合并两个多索引数据帧。我的代码如下。正如您在输出中看到的那样,问题在于“DATE”索引重复,而我希望所有值(OPEN_INT、PX_LAST)位于同一日期索引上......有什么想法吗?我尝试过追加和连接,但都给出了相似的结果。

            if df.empty:
                df = bbg_historicaldata(t, f, startDate, endDate)
                datesArray = list(df.index)
                tArray = [t for i in range(len(datesArray))]
                arrays = [tArray, datesArray]
                tuples = list(zip(*arrays))
                index = pd.MultiIndex.from_tuples(tuples, names=['TICKER', 'DATE'])                    
                df = pd.DataFrame({f : df[f].values}, index=index)

            else:
                temp = bbg_historicaldata(t,f,startDate,endDate)
                datesArray = list(temp.index)
                tArray = [t for i in range(len(datesArray))]
                arrays = [tArray, datesArray]
                tuples = list(zip(*arrays))
                index = pd.MultiIndex.from_tuples(tuples, names=['TICKER', 'DATE'])


                temp = pd.DataFrame({f : temp[f].values}, index=index)

                #df = df.append(temp, ignore_index = True)
                df = pd.concat([df, temp]).sortlevel()

结果:

                        OPEN_INT  PX_LAST
TICKER      DATE                          
EDH8 COMDTY 2017-02-01        NaN   98.365
            2017-02-01  1008044.0      NaN
            2017-02-02        NaN   98.370
            2017-02-02  1009994.0      NaN
            2017-02-03        NaN   98.360
            2017-02-03  1019181.0      NaN
            2017-02-06        NaN   98.405
            2017-02-06  1023863.0      NaN
            2017-02-07        NaN   98.410
            2017-02-07  1024609.0      NaN
            2017-02-08        NaN   98.435
            2017-02-08  1046258.0      NaN
            2017-02-09        NaN   98.395

本质上是想得到它,这样就不会有 NaN!

编辑:将“axis = 1”添加到 concat 会产生以下结果(我的错误是没有首先包含附加输出)

                        PX_LAST   OPEN_INT  PX_LAST  OPEN_INT  PX_LAST  \
TICKER      DATE                                                         
EDH8 COMDTY 2017-02-01   98.365  1008044.0      NaN       NaN      NaN   
            2017-02-02   98.370  1009994.0      NaN       NaN      NaN   
            2017-02-03   98.360  1019181.0      NaN       NaN      NaN   
            2017-02-06   98.405  1023863.0      NaN       NaN      NaN   
            2017-02-07   98.410  1024609.0      NaN       NaN      NaN   
            2017-02-08   98.435  1046258.0      NaN       NaN      NaN   
            2017-02-09   98.395  1050291.0      NaN       NaN      NaN   
EDM8 COMDTY 2017-02-01      NaN        NaN   98.245  726739.0      NaN   
            2017-02-02      NaN        NaN   98.250  715081.0      NaN   
            2017-02-03      NaN        NaN   98.235  723936.0      NaN   
            2017-02-06      NaN        NaN   98.285  729324.0      NaN   
            2017-02-07      NaN        NaN   98.295  728673.0      NaN   
            2017-02-08      NaN        NaN   98.325  728520.0      NaN   
            2017-02-09      NaN        NaN   98.280  741840.0      NaN   
EDU8 COMDTY 2017-02-01      NaN        NaN      NaN       NaN   98.130   
            2017-02-02      NaN        NaN      NaN       NaN   98.135   
            2017-02-03      NaN        NaN      NaN       NaN   98.120   
            2017-02-06      NaN        NaN      NaN       NaN   98.180   
            2017-02-07      NaN        NaN      NaN       NaN   98.190   
            2017-02-08      NaN        NaN      NaN       NaN   98.225   
            2017-02-09      NaN        NaN      NaN       NaN   98.175  

谢谢!

最佳答案

不清楚输入的格式是什么。

我假设OPEN_INT看起来像这样:

import datetime
import pandas as pd


open_int = pd.DataFrame(
    [
        (datetime.date(2017, 2, 1), 1008044.0),
        (datetime.date(2017, 2, 2), 1009994.0),
        (datetime.date(2017, 2, 3), 1019181.0),
        (datetime.date(2017, 2, 6), 1023863.0),
        (datetime.date(2017, 2, 7), 1024609.0),
        (datetime.date(2017, 2, 8), 1046258.0),
    ],
    columns=['DATE', 'OPEN_INT']
)
open_int['TICKER'] = 'EDH8 COMDTY'
open_int.set_index(['TICKER', 'DATE'], inplace=True)

print(open_int)
#                          OPEN_INT
# TICKER      DATE
# EDH8 COMDTY 2017-02-01  1008044.0
#             2017-02-02  1009994.0
#             2017-02-03  1019181.0
#             2017-02-06  1023863.0
#             2017-02-07  1024609.0
#             2017-02-08  1046258.0

PX_LAST 看起来像这样:

px_last = pd.DataFrame(
    [
        (datetime.date(2017, 2, 1), 98.365),
        (datetime.date(2017, 2, 2), 98.370),
        (datetime.date(2017, 2, 3), 98.360),
        (datetime.date(2017, 2, 6), 98.405),
        (datetime.date(2017, 2, 7), 98.410),
        (datetime.date(2017, 2, 8), 98.435),
        (datetime.date(2017, 2, 9), 98.395),

    ],
    columns=['DATE', 'PX_LAST']
)
px_last['TICKER'] = 'EDH8 COMDTY'
px_last.set_index(['TICKER', 'DATE'], inplace=True)

print(px_last)
#                         PX_LAST
# TICKER      DATE
# EDH8 COMDTY 2017-02-01   98.365
#             2017-02-02   98.370
#             2017-02-03   98.360
#             2017-02-06   98.405
#             2017-02-07   98.410
#             2017-02-08   98.435
#             2017-02-09   98.395

然后你将它们连接起来并得到你想要的:

df = pd.concat([open_int, px_last], axis=1)
print(df)
#                          OPEN_INT  PX_LAST
# TICKER      DATE
# EDH8 COMDTY 2017-02-01  1008044.0   98.365
#             2017-02-02  1009994.0   98.370
#             2017-02-03  1019181.0   98.360
#             2017-02-06  1023863.0   98.405
#             2017-02-07  1024609.0   98.410
#             2017-02-08  1046258.0   98.435
#             2017-02-09        NaN   98.395

关于Python Pandas - 问题附加/连接两个多索引数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42149584/

相关文章:

用 R 中的数字和字符值替换数字列的 NA

python - 一个数据帧与另一个数据帧的相关矩阵

python - Google App Engine 和 Cloud SQL : Lost connection to MySQL server at 'reading initial communication packet' SQL 2nd Gen

Python Dask 并行运行 Bag 操作

pandas - 如何修复 "ImportError: Pandas >= 0.19.2 must be installed; however, it was not found"?

python - Pandas read_csv 在 NFS 上 super 慢

python - Pandas :更新和合并数据框的更好方法

r - 合并两个具有不同行数的数据框,并为同一列重复相同的值

python - PostgreSQL 使用 Python pyodbc 将文本字段中的 LF 替换为 CR+LF

python - 获取用于 django 动态过滤器的 url 参数