我正在尝试使用 pandas.read_csv 导入一个 csv 文件。文件如下:
"COL_A","COL_B","COL_C"
"ROW1COLA","ROW1COLB","ROW1COLC","ROW1COLD"
"ROW2COLA","ROW2COLB","ROW2COLC","ROW2COLD"
"ROW3COLA","ROW3COLB","ROW3COLC","ROW3COLD"
"ROW4COLA","ROW4COLB","ROW4COLC","ROW4COLD"
"ROW5COLA","ROW5COLB","ROW5COLC","ROW5COLD"
"ROW6COLA","ROW6COLB","ROW6COLC","ROW6COLD"
"ROW7COLA","ROW7COLB","ROW7COLC","ROW7COLD"
我第一次尝试运行:
data = pd.read_csv('broken.csv')
我得到了:
COL_A COL_B COL_C
ROW1COLA ROW1COLB ROW1COLC ROW1COLD
ROW2COLA ROW2COLB ROW2COLC ROW2COLD
ROW3COLA ROW3COLB ROW3COLC ROW3COLD
ROW4COLA ROW4COLB ROW4COLC ROW4COLD
ROW5COLA ROW5COLB ROW5COLC ROW5COLD
ROW6COLA ROW6COLB ROW6COLC ROW6COLD
ROW7COLA ROW7COLB ROW7COLC ROW7COLD
设置 index_col=False
data = pd.read_csv('broken.csv',index_col=False)
我得到了
COL_A COL_B COL_C
0 ROW1COLA ROW1COLB ROW1COLC
1 ROW2COLA ROW2COLB ROW2COLC
2 ROW3COLA ROW3COLB ROW3COLC
3 ROW4COLA ROW4COLB ROW4COLC
4 ROW5COLA ROW5COLB ROW5COLC
5 ROW6COLA ROW6COLB ROW6COLC
6 ROW7COLA ROW7COLB ROW7COLC
如果我添加 prefix = 'X'
data = pd.read_csv('broken.csv',index_col=False,prefix='X')
我明白了
COL_A COL_B COL_C
0 ROW1COLA ROW1COLB ROW1COLC
1 ROW2COLA ROW2COLB ROW2COLC
2 ROW3COLA ROW3COLB ROW3COLC
3 ROW4COLA ROW4COLB ROW4COLC
4 ROW5COLA ROW5COLB ROW5COLC
5 ROW6COLA ROW6COLB ROW6COLC
6 ROW7COLA ROW7COLB ROW7COLC
同read_table
data = pd.read_table('broken.csv',index_col=True,sep=',')
我想知道是否有任何方法可以让 pandas 自动分配标题并获取缺少的标题列的值
最佳答案
我想你可以使用 read_csv
带有参数 header=0
,第一行设置为列,然后被参数 names
覆盖为自定义列名。省略参数 sep=','
,因为它默认为:
import pandas as pd
import io
temp=u'''"COL_A","COL_B","COL_C"
"ROW1COLA","ROW1COLB","ROW1COLC","ROW1COLD"
"ROW2COLA","ROW2COLB","ROW2COLC","ROW2COLD"
"ROW3COLA","ROW3COLB","ROW3COLC","ROW3COLD"
"ROW4COLA","ROW4COLB","ROW4COLC","ROW4COLD"
"ROW5COLA","ROW5COLB","ROW5COLC","ROW5COLD"
"ROW6COLA","ROW6COLB","ROW6COLC","ROW6COLD"
"ROW7COLA","ROW7COLB","ROW7COLC","ROW7COLD"'''
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), header=0, names=['a','b','c','d'])
print df
a b c d
0 ROW1COLA ROW1COLB ROW1COLC ROW1COLD
1 ROW2COLA ROW2COLB ROW2COLC ROW2COLD
2 ROW3COLA ROW3COLB ROW3COLC ROW3COLD
3 ROW4COLA ROW4COLB ROW4COLC ROW4COLD
4 ROW5COLA ROW5COLB ROW5COLC ROW5COLD
5 ROW6COLA ROW6COLB ROW6COLC ROW6COLD
6 ROW7COLA ROW7COLB ROW7COLC ROW7COLD
更通用的解决方案,参数 header=None
没有来自标题的列名称,skiprows=[0]
跳过第一行,缺少最后一列的名称:
import pandas as pd
import io
temp=u'''"COL_A","COL_B","COL_C"
"ROW1COLA","ROW1COLB","ROW1COLC","ROW1COLD"
"ROW2COLA","ROW2COLB","ROW2COLC","ROW2COLD"
"ROW3COLA","ROW3COLB","ROW3COLC","ROW3COLD"
"ROW4COLA","ROW4COLB","ROW4COLC","ROW4COLD"
"ROW5COLA","ROW5COLB","ROW5COLC","ROW5COLD"
"ROW6COLA","ROW6COLB","ROW6COLC","ROW6COLD"
"ROW7COLA","ROW7COLB","ROW7COLC","ROW7COLD"'''
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), header=None, skiprows=[0])
print df
0 1 2 3
0 ROW1COLA ROW1COLB ROW1COLC ROW1COLD
1 ROW2COLA ROW2COLB ROW2COLC ROW2COLD
2 ROW3COLA ROW3COLB ROW3COLC ROW3COLD
3 ROW4COLA ROW4COLB ROW4COLC ROW4COLD
4 ROW5COLA ROW5COLB ROW5COLC ROW5COLD
5 ROW6COLA ROW6COLB ROW6COLC ROW6COLD
6 ROW7COLA ROW7COLB ROW7COLC ROW7COLD
关于python - Pandas read_csv,读取缺少标题元素的 csv 文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36828348/