python - Pandas 未从 xlsx 文件中读取第一列

标签 python python-3.x pandas

我正在处理一个 Excel 文件,其中包含按月份(9 月 18 日至 10 月 15 日)划分的多 (36) 张工作表,并使用字典阅读所有这些工作表

import pandas as pd

fileName = 'project_dropColumnICSv2.xlsx'
df = pd.ExcelFile(fileName)

sheetNames = df.sheet_names
vars_dict = {}

for sheetName in sheetNames:
    vars_dict["the_{0}".format(sheetName)] = pd.read_excel(fileName, sheet_name=sheetName, index_col=False)

mykeys = []

for key, value in vars_dict.items():
    mykeys.append(key)

我需要一次设置它们的 14 个列名称,但我收到 ValueError:长度不匹配

在这里,我们可以看到一些工作表仅包含 13 列

for mykey in mykeys:
    print("'{}' contains {} columns".format((mykey), len(vars_dict.get(mykey).columns)))

'the_Sep 18' contains 14 columns
'the_Aug 18' contains 14 columns
'the_Jul 18' contains 14 columns
'the_Jun 18' contains 14 columns
'the_May 18' contains 14 columns
'the_April 18' contains 14 columns
'the_March 18' contains 14 columns
'the_February 18' contains 13 columns
'the_January 18' contains 14 columns
'the_December 17' contains 13 columns
'the_November 17' contains 13 columns
'the_October 17' contains 13 columns
'the_September 17' contains 13 columns
'the_August 17' contains 14 columns
'the_July 17' contains 14 columns
'the_June 17' contains 14 columns
'the_May 17' contains 14 columns
'the_April 17' contains 14 columns
'the_MARCH 17' contains 14 columns
'the_February17' contains 14 columns
'the_January17' contains 14 columns
'the_December16' contains 14 columns
'the_November16' contains 14 columns
'the_October 16' contains 14 columns
'the_September' contains 14 columns
'the_August' contains 15 columns
'the_July' contains 14 columns
'the_June' contains 14 columns
'the_May' contains 14 columns
'the_April' contains 14 columns
'the_March' contains 13 columns
'the_February' contains 13 columns
'the_January' contains 13 columns
'the_December' contains 13 columns
'the_November' contains 14 columns
'the_October' contains 13 columns

我尝试添加另一列

for mykey in mykeys:
    if len(vars_dict.get(mykey).columns) == 13:
        vars_dict.get(mykey)['Another Column'] = 'Nan'

使用 for 循环更改列名称,但得到的结果是第一列有错误的字段,简而言之,未对齐。

假设有一个我的列名称数组,我该如何使其工作?

for mykey in mykeys:
    vars_dict.get(mykey).columns = column

附注有一张表包含 15 列,只需删除最后一个即可解决

最佳答案

我认为需要参数sheet_name=None来将read_excel中的所有工作表转换为DataFrames的OrderedDict :

fileName = 'project_dropColumnICSv2.xlsx'
dfs = pd.read_excel(fileName, sheet_name=None)

然后使用字典理解来检查列数并通过assign设置新的并创建新字典:

dfs = {k: v.assign(New = np.nan) if len(v.columns) == 13 else v for k, v in dfs.items()}

如果需要更改 key :

dfs = {f'the_{}'.format(k): v.assign(New = np.nan) 
       if len(v.columns) == 13 
       else v for k, v in dfs.items()}

然后按键选择每个DataFrame:

print (dfs['Sep 18'])

关于python - Pandas 未从 xlsx 文件中读取第一列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52529948/

相关文章:

以交替方式组合(交错、交错、交织)两个列表的 Pythonic 方式?

python-2.7 - MarkLogic/Python 查询仅搜索一个文件

python - 阻塞套接字客户端示例

python - 检索文件路径的尾端

python - Pyodbc : Can't open lib 'Microsoft Access Driver (*.mdb, *.accdb)' : file not found (0)

python - 从列表中以相反的顺序删除短语

python - 在 MacOS HighSierra 上安装 qiskit 错误 : No such file or directory: 'qiskit.egg-info'

python - 为什么 Pandas 默认遍历 DataFrame 列?

Python pandas 日期时间差异

python - 使用另一个时间戳数据帧来过滤 pandas 上的时间戳数据帧