对于一个文件夹中的多个csv
文件,我希望循环所有以csv
结尾的文件并合并为一个excel文件,这里我举两个例子:
第一个.csv
date a b
0 2019.1 1.0 NaN
1 2019.2 NaN 2.0
2 2019.3 3.0 2.0
3 2019.4 3.0 NaN
第二个.csv
date c d
0 2019.1 1.0 NaN
1 2019.2 5.0 2.0
2 2019.3 3.0 7.0
3 2019.4 6.0 NaN
4 2019.5 NaN 10.0
...
我想要的输出是这样的,根据日期
合并它们:
date a b c d
0 2019/1/31 1.0 NaN 1.0 NaN
1 2019/2/28 NaN 2.0 5.0 2.0
2 2019/3/31 3.0 2.0 3.0 7.0
3 2019/4/30 3.0 NaN 6.0 NaN
4 2019/5/31 NaN NaN NaN 10.0
我编辑了以下代码,但显然有一些关于date
转换和合并dfs
的部分不正确:
import numpy as np
import pandas as pd
import glob
dfs = pd.DataFrame()
for file_name in glob.glob("*.csv"):
# print(file_name)
df = pd.read_csv(file_name, engine='python', skiprows=2, encoding='utf-8')
df = df.dropna()
df = df.dropna(axis = 1)
df['date'] = pd.to_datetime(df['date'], format='%Y.%m')
...
dfs = pd.merge(df1, df2, on = 'date', how= "outer")
# save the data frame
writer = pd.ExcelWriter('output.xlsx')
dfs.to_excel(writer,'sheet1')
writer.save()
请帮助我。谢谢。
最佳答案
尝试这样:
import numpy as np
import pandas as pd
import glob
from pandas.tseries.offsets import MonthEnd
dfs = pd.DataFrame()
for file_name in glob.glob("*.csv"):
df = pd.read_csv(file_name, engine='python', skiprows=2, encoding='utf-8')
df.columns = df.columns.str.lower().str.replace('dates', 'date')
df = df.dropna()
df = df.dropna(axis = 1)
df['date'] = pd.to_datetime(df['date'].astype(str), format='%Y.%m') + MonthEnd(1)
if dfs.empty:
dfs = df.copy()
else:
dfs = dfs.merge(df, on='date', how="outer")
关于python - 循环excel文件并基于Python中的一个公共(public)列进行合并,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58537324/