python - 合并来自多个excel的一张表的数据

我有多个格式如下的 excel 文件:

ID | Name | Prop1 | Prop2 | User

来自excel1的数据:

ID | Name | Prop1 | Prop2 | Prop3 | User 
1  | test |       |       |       | John

来自 Excel2 的数据:

ID | Name | Prop1 | Prop2 | Prop3 | User
1  | test |   a   |   b   |       | John

来自 Excel3 的数据:

ID | Name | Prop1 | Prop2 | Prop3 | User
1  | test |       |       |   c   | John

我想做的是将这些单元格组合起来。

期望的输出:

ID | Name | Prop1 | Prop2 | Prop3 | User
1  | test |   a   |   b   |   c   | John

如果文件中的单元格为空，而另一个文件中有值，我想替换它。

有什么简单的方法可以做到这一点吗？

谢谢。

最佳答案

您可以通过 glob 创建所有 DataFrame 的列表，最终 df 需要 combine_first与 reduce :

import glob
from functools import reduce

files = glob.glob('files/*.xlsx')
dfs = [pd.read_excel(fp).set_index(['ID','Name','User']) for fp in files]

df1 = reduce(lambda l,r: pd.DataFrame.combine_first(l,r), dfs)
print (df1)
             Prop1 Prop2 Prop3
ID Name User                  
1  test John     a     b     c

编辑:如果不需要将文件与 NaN 组合，解决方案更简单:

import glob

files = glob.glob('files/*.xlsx')
df = pd.concat([pd.read_excel(fp) for fp in files],ignore_index=True)

关于python - 合并来自多个excel的一张表的数据，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49608419/

上一篇：python - 使用文本文件中的正则表达式在 Python 中的特定单词后查找单词

下一篇：python - Matplotlib 中的极坐标图未居中

python - 检查 python 中的 Json 数据是否为 none

python - IronPython 只使用 Python 还是要使用 IronPython 我需要了解 Python 以外的其他编程语言吗？

excel - 提取或选择单元格的前两个数字

javascript - Excel 就像在 javascript 中建立索引

python - 具体爆列

python - 检查药水使用情况的更好方法？

python - 在 ubuntu 20.04 中安装 libpq-dev 的问题

asp.net - 整数零， "0' 上传到 SQL Server 时将被忽略

python - 应用的 Pandas 替代方案 - 基于多列创建新列