我有一个包含多个工作表的 Excel 文件需要合并。但是,列标题各不相同。目前的数据是这样的。
Sheet 1
+-------------+--------------+----------+--------+---------+---------+
| FISCAL_YEAR | COMPANY_CODE | ACCOUNTS | Header | Header1 | Header2 |
+-------------+--------------+----------+--------+---------+---------+
| 17 | Data | Data | 0 | 0 | 0 |
| 17 | Data | Data | 0 | 0 | 0 |
+-------------+--------------+----------+--------+---------+---------+
Sheet 2
+-------------+--------------+----------+---------+---------+
| FISCAL_YEAR | COMPANY_CODE | ACCOUNTS | Header3 | Header2 |
+-------------+--------------+----------+---------+---------+
| 15 | Data | Data | 0 | 0 |
| 15 | Data | Data | 0 | 0 |
+-------------+--------------+----------+---------+---------+
Sheet 3
+-------------+--------------+----------+---------+---------+---------+
| FISCAL_YEAR | COMPANY_CODE | ACCOUNTS | Header4 | Header1 | Header3 |
+-------------+--------------+----------+---------+---------+---------+
| 16 | Data | Data | 0 | 0 | 0 |
| 16 | Data | Data | 0 | 0 | 0 |
+-------------+--------------+----------+---------+---------+---------+
OUTPUT
+-------------+--------------+----------+--------+---------+---------+---------+---------+-----------+
| FISCAL_YEAR | COMPANY_CODE | ACCOUNTS | Header | Header1 | Header2 | Header3 | Header4 | SheetName |
+-------------+--------------+----------+--------+---------+---------+---------+---------+-----------+
| 17 | Data | Data | 0 | 0 | 0 | null | null | Sheet1 |
| 17 | Data | Data | 0 | 0 | 0 | null | null | Sheet1 |
| 15 | Data | Data | null | null | 0 | 0 | null | Sheet2 |
| 15 | Data | Data | null | null | 0 | 0 | null | Sheet2 |
| 16 | Data | Data | null | 0 | null | 0 | 0 | Sheet3 |
| 16 | Data | Data | null | 0 | null | 0 | 0 | Sheet3 |
+-------------+--------------+----------+--------+---------+---------+---------+---------+-----------+
我对 Python 比较陌生。我用过 Pandas 和 numpy。 我有多达 60 张纸要工作。谁能帮助我了解如何实现这一目标?如果不是 python,我应该使用其他工具/方法吗?我真的可以使用代码示例作为开始。
非常感谢您的帮助。提前谢谢你
最佳答案
使用 R,这很容易做到。
library(openxlsx) # to read xlsx files
library(purrr) # for the "map" function
wb <- loadWorkbook("path/filename.xlsx")
all_sheets <- names(wb)
merged_data <- map_df(all_sheets, ~ read.xlsx(wb, sheet = .x)
关于python - 在 Python 或 R 中合并具有不同 header 的数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49846754/