我有一个包含这样数据的文件:
2.10.2014 23:30:00,"25,1",nan,nan,nan
2.10.2014 23:30:00,nan,"15,2",nan,nan
2.10.2014 23:30:00,nan,nan,"125,14",nan
2.10.2014 23:45:00,nan,0,nan,nan
我想阅读此文件。期望的输出:
2.10.2014 23:30:00 25.1 nan nan nan
2.10.2014 23:30:00 nan 15.2 nan nan
2.10.2014 23:30:00 nan nan 125.14 nan
2.10.2014 23:45:00 nan 0 nan nan
重要的是要注意,如果值 0
出现,引号就消失了。
此时我的代码如下所示:
import pandas as pd
import csv
df=pd.read_csv("file.csv",
sep=',\s+',
quoting=csv.QUOTE_NONE,
header=None,
encoding="mbcs")
结果:
"2.10.2014 23:30:00,""25,1"",nan,nan,nan"
代替 quoting=csv.QUOTE_NONE
我还尝试使用 escapechar='"'
最佳答案
将 decimal=','
传递给 read_csv
:
In [28]:
import io
import pandas as pd
t="""2.10.2014 23:30:00,"25,1",nan,nan,nan
2.10.2014 23:30:00,nan,"15,2",nan,nan
2.10.2014 23:30:00,nan,nan,"125,14",nan
2.10.2014 23:45:00,nan,0,nan,nan"""
pd.read_csv(io.StringIO(t), decimal=',', header=None)
Out[28]:
0 1 2 3 4
0 2.10.2014 23:30:00 25.1 NaN NaN NaN
1 2.10.2014 23:30:00 NaN 15.2 NaN NaN
2 2.10.2014 23:30:00 NaN NaN 125.14 NaN
3 2.10.2014 23:45:00 NaN 0.0 NaN NaN
此外,您可以传递 parse_dates=[0]
将第一列解释为 datetime
:
In [31]:
pd.read_csv(io.StringIO(t), decimal=',', header=None, parse_dates=[0])
Out[31]:
0 1 2 3 4
0 2014-02-10 23:30:00 25.1 NaN NaN NaN
1 2014-02-10 23:30:00 NaN 15.2 NaN NaN
2 2014-02-10 23:30:00 NaN NaN 125.14 NaN
3 2014-02-10 23:45:00 NaN 0.0 NaN NaN
在你的情况下忽略 io.StringIO
位,这只是为了让我从文本字符串加载你的数据,只需执行以下操作:
df=pd.read_csv("file.csv", sep=',\s+', quoting=csv.QUOTE_NONE, header=None, decimal=',', parse_dates=[0], encoding="mbcs")
关于python - 读取 csv(逗号分隔文件),值在引号中,逗号作为小数点,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32369679/