python - 尝试将多个 .csv 读取到单独的数据框列中

我正在阅读几个 .csv 文件(每个文件都是一个时间序列，其中第一列中的日期(我想作为索引)，第二列中的时间序列。我可以读入数据，但它是当我希望每个文件都有自己的按日期索引的列时，所有这些都附加到数据框中的同一列:

例如，如果我有 3 个文件(实际上我有超过三个)

csv1
1/1/2016,1.1
2/1/2016,1.2
3/1/2016,1.6

csv2
1/1/2016,4.6
2/1/2016,31.2
3/1/2016,1.8

csv3
2/1/2016,3.2
3/1/2016,5.8

目前我返回:

0        1 
1/1/2016 1.1
2/1/2016 1.2
3/1/2016 1.6
1/1/2016 4.6
2/1/2016 31.2
3/1/2016 1.8
2/1/2016 3.2
3/1/2016 5.8

当我想返回时:

0        1   2   3
1/1/2016 1.1 4.6 null
2/1/2016 1.2 31.2 3.2
3/1/2016 1.6 1.8 5.8

我现在的代码如下所示:

def getData(rawDataPath): 
    big_frame = pd.DataFrame()
    path = rawDataPath
    allfiles = glob.glob(os.path.join(path,"*.csv"))


    np_array_list = []
    for file_ in allfiles:
        df = pd.read_csv(file_,index_col=None, header=0)
        np_array_list.append(df.as_matrix())

    comb_np_array = np.vstack(np_array_list)

    big_frame = big_frame.append(pd.DataFrame(comb_np_array))

    return big_frame

最佳答案

既然您已经使用 pandas 中的 DataFrame，不妨使用 pandas' join/merging functionality :

In [21]: csv1 = io.StringIO("""1/1/2016,1.1
2/1/2016,1.2
3/1/2016,1.6""")

In [22]: csv2 = io.StringIO("""1/1/2016,4.6
2/1/2016,31.2
3/1/2016,1.8""")

In [23]: csv3 = io.StringIO("""2/1/2016,3.2
3/1/2016,5.8""")

In [24]: df1 = pd.read_csv(csv1, header=None)

In [25]: df2 = pd.read_csv(csv2, header=None)

In [26]: df3 = pd.read_csv(csv3, header=None)

In [27]: pd.merge(pd.merge(df1, df2, on=0, how='outer'), df3, on=0, how='outer')
Out[27]: 
          0  1_x   1_y    1
0  1/1/2016  1.1   4.6  NaN
1  2/1/2016  1.2  31.2  3.2
2  3/1/2016  1.6   1.8  5.8

该示例使用 how='outer'，这意味着完全外部联接。选择此选项是为了防止您的数据在文件之间可能缺少 key 。如果情况并非如此，请考虑其他最适合您的策略。

为了以合理的方式减少所有文件，您可以这样做:

In [30]: from functools import partial, reduce

In [31]: reduce(partial(pd.merge, on=0, how='outer'), [df1, df2, df3])
Out[31]: 
          0  1_x   1_y    1
0  1/1/2016  1.1   4.6  NaN
1  2/1/2016  1.2  31.2  3.2
2  3/1/2016  1.6   1.8  5.8

只需用您自己的预加载数据框替换列表即可:

def getData(rawDataPath):
    path = rawDataPath
    allfiles = glob.glob(os.path.join(path, "*.csv"))
    dataframes = (pd.read_csv(fname, header=None, names=['date', fname])
                  for fname in allfiles)
    return reduce(partial(pd.merge, on='date', how='outer'), dataframes)

关于python - 尝试将多个 .csv 读取到单独的数据框列中，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36518125/

python - 尝试将多个 .csv 读取到单独的数据框列中

上一篇：python - 仅返回字符串的单个匹配项

下一篇：python - 在 Pandas 中，如何获取截至时间 T 的唯一值的数量？