python - 使用带有注释标题的 pandas 读取 csv

我的 CSV 文件在标题行中包含 #:

s = '#one two three\n1 2 3'

如果我使用 pd.read_csv # 符号进入第一个标题:

import pandas as pd
from io import StringIO
pd.read_csv(StringIO(s), delim_whitespace=True)
     #one  two  three
0     1    2      3

如果我设置参数 comment='#'，则 pandas 会完全忽略该行。

有没有简单的方法来处理这种情况？

第二个相关的问题是在这种情况下我如何处理引用，它在没有 # 的情况下工作:

s = '"one one" two three\n1 2 3'
print(pd.read_csv(StringIO(s), delim_whitespace=True))
   one one  two  three
0        1    2      3

它不适用于#:

s = '#"one one" two three\n1 2 3'
print(pd.read_csv(StringIO(s), delim_whitespace=True))
   #"one  one"  two  three
0      1     2    3    NaN

谢谢!

++++++++++更新

这里是第二个例子的测试。

s = '#"one one" two three\n1 2 3'
# here I am cheating slicing the string
wanted_result = pd.read_csv(StringIO(s[1:]), delim_whitespace=True)
# is there a way to achieve the same result configuring somehow read_csv?
assert wanted_result.equals(pd.read_csv(StringIO(s), delim_whitespace=True))

最佳答案

您可以这样重命名 read_csv() 输出的第一个 header :

import pandas as pd

from io import StringIO
df = pd.read_csv(StringIO(s), delim_whitespace=True)
new_name =  df.columns[0].split("#")[0]
df.rename(columns={df.columns[0]:new_name})

关于python - 使用带有注释标题的 pandas 读取 csv，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/30311776/

上一篇：python - 安装 pymssql 时遇到问题

下一篇：python - 除非明确要求，否则禁用某些测试的自动测试执行

相关文章：

python - 使用 numpy 求和，其中 i != j

python - Pandas groupby agg std NaN

c# - 从 C# 中的 csv 文件读取特定列

python - 简单数据操作 : R vs python

python - 无法调用python中的函数

python - 没有名为 openai 的模块

csv - 在 Cypher + Neo4j 中加载 CSV 失败 "LoadExternalResourceException: Couldn' t 在 :"加载外部资源

Python 检查压缩文件是 xml 还是 csv

python - 如何比较两个 pandas 数据帧并删除一个文件上的重复项而不附加其他文件中的数据

Python三元执行顺序