python - PyCharm 变量资源管理器不显示带有空格的 Pandas 列名称

我有这个 .xlsx 文件，我可以通过以下方式成功读取:

pandas.read_excel(file_name, sheet_name="customers", index_col=0)

这适用于大多数列，但有一些列在字符之间有空格，例如“个人资料网址”。这些列只是丢失了。

编辑:

这是重现问题的一些代码:

import pandas as pd

def read_excel(file_name):
    df = pd.read_excel(file_name, sheet_name="customers", index_col=0)
    for entry in df.iterrows():
        print(entry)
    return df


read_excel("test_table.xlsx")

这是一个要使用的示例表:

ID,First,Last,Profile Url
1,foo,bar,www.google.com
2,fake,name,https://stackoverflow.com/

这是第一次迭代中entry的值。这样做我可以对象 First 和 Last。

我希望也能看到Profile Url。

我通过准备这个例子学到的是，任何以小写字母书写的 header 也将被忽略。

最佳答案

该行为与任何特定文件类型无关，对于列名称中带有空格的任何数据框都是这种情况，无论创建数据框的方法如何。
目前有一个 issue与 JetBrains 就此行为进行交流。
解决方案是，通过用另一个字符替换空格来修复列，例如 '_'。
小写列名不预设同样的问题。我的猜测是列名中有前导或尾随空格，可以使用 .str.strip()

import pandas as pd

df = pd.DataFrame({'col_no_spaces': [1, 2, 3], 'col with spaces': ['a', 'b', 'c'], ' col_with_leading_trailing_ws ': [4, 5, 6]})

# display(df)
   col_no_spaces col with spaces   col_with_leading_trailing_ws 
0              1               a                               4
1              2               b                               5
2              3               c                               6

请注意带空格的列，View as Series

# strip leading and trailing whitespace, and replace spaces in column names with _
df.columns = df.columns.str.strip().str.replace('\s+', '_', regex=True)

# display(df)
   col_no_spaces col_with_spaces  col_with_leading_trailing_ws
0              1               a                             4
1              2               b                             5
2              3               c                             6

请注意，所有列现在都可用于View as Series

关于python - PyCharm 变量资源管理器不显示带有空格的 Pandas 列名称，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/64177710/

python - PyCharm 变量资源管理器不显示带有空格的 Pandas 列名称

上一篇：bash - 将功能分支 merge 到 master 后如何建议开发人员启用 git LFS (git)

下一篇：python 函数或特别是 numpy，它返回一个数组，其中包含一行中某项的重复次数