python - Pandas read_csv,读取缺少标题元素的 csv 文件

标签 python csv pandas

我正在尝试使用 pandas.read_csv 导入一个 csv 文件。文件如下:

    "COL_A","COL_B","COL_C"
    "ROW1COLA","ROW1COLB","ROW1COLC","ROW1COLD"
    "ROW2COLA","ROW2COLB","ROW2COLC","ROW2COLD"
    "ROW3COLA","ROW3COLB","ROW3COLC","ROW3COLD"
    "ROW4COLA","ROW4COLB","ROW4COLC","ROW4COLD"
    "ROW5COLA","ROW5COLB","ROW5COLC","ROW5COLD"
    "ROW6COLA","ROW6COLB","ROW6COLC","ROW6COLD"
    "ROW7COLA","ROW7COLB","ROW7COLC","ROW7COLD"

我第一次尝试运行:

    data = pd.read_csv('broken.csv')

我得到了:

                 COL_A     COL_B     COL_C
    ROW1COLA  ROW1COLB  ROW1COLC  ROW1COLD
    ROW2COLA  ROW2COLB  ROW2COLC  ROW2COLD
    ROW3COLA  ROW3COLB  ROW3COLC  ROW3COLD
    ROW4COLA  ROW4COLB  ROW4COLC  ROW4COLD
    ROW5COLA  ROW5COLB  ROW5COLC  ROW5COLD
    ROW6COLA  ROW6COLB  ROW6COLC  ROW6COLD
    ROW7COLA  ROW7COLB  ROW7COLC  ROW7COLD

设置 index_col=False

    data = pd.read_csv('broken.csv',index_col=False)

我得到了

          COL_A     COL_B     COL_C
    0  ROW1COLA  ROW1COLB  ROW1COLC
    1  ROW2COLA  ROW2COLB  ROW2COLC
    2  ROW3COLA  ROW3COLB  ROW3COLC
    3  ROW4COLA  ROW4COLB  ROW4COLC
    4  ROW5COLA  ROW5COLB  ROW5COLC
    5  ROW6COLA  ROW6COLB  ROW6COLC
    6  ROW7COLA  ROW7COLB  ROW7COLC

如果我添加 prefix = 'X'

    data = pd.read_csv('broken.csv',index_col=False,prefix='X')

我明白了

          COL_A     COL_B     COL_C
    0  ROW1COLA  ROW1COLB  ROW1COLC
    1  ROW2COLA  ROW2COLB  ROW2COLC
    2  ROW3COLA  ROW3COLB  ROW3COLC
    3  ROW4COLA  ROW4COLB  ROW4COLC
    4  ROW5COLA  ROW5COLB  ROW5COLC
    5  ROW6COLA  ROW6COLB  ROW6COLC
    6  ROW7COLA  ROW7COLB  ROW7COLC

同read_table

    data = pd.read_table('broken.csv',index_col=True,sep=',')

我想知道是否有任何方法可以让 pandas 自动分配标题并获取缺少的标题列的值

最佳答案

我想你可以使用 read_csv带有参数 header=0,第一行设置为列,然后被参数 names 覆盖为自定义列名。省略参数 sep=',',因为它默认为:

import pandas as pd
import io

temp=u'''"COL_A","COL_B","COL_C"
"ROW1COLA","ROW1COLB","ROW1COLC","ROW1COLD"
"ROW2COLA","ROW2COLB","ROW2COLC","ROW2COLD"
"ROW3COLA","ROW3COLB","ROW3COLC","ROW3COLD"
"ROW4COLA","ROW4COLB","ROW4COLC","ROW4COLD"
"ROW5COLA","ROW5COLB","ROW5COLC","ROW5COLD"
"ROW6COLA","ROW6COLB","ROW6COLC","ROW6COLD"
"ROW7COLA","ROW7COLB","ROW7COLC","ROW7COLD"'''
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), header=0, names=['a','b','c','d'])

print df
          a         b         c         d
0  ROW1COLA  ROW1COLB  ROW1COLC  ROW1COLD
1  ROW2COLA  ROW2COLB  ROW2COLC  ROW2COLD
2  ROW3COLA  ROW3COLB  ROW3COLC  ROW3COLD
3  ROW4COLA  ROW4COLB  ROW4COLC  ROW4COLD
4  ROW5COLA  ROW5COLB  ROW5COLC  ROW5COLD
5  ROW6COLA  ROW6COLB  ROW6COLC  ROW6COLD
6  ROW7COLA  ROW7COLB  ROW7COLC  ROW7COLD

更通用的解决方案,参数 header=None 没有来自标题的列名称,skiprows=[0] 跳过第一行,缺少最后一列的名称:

import pandas as pd
import io

temp=u'''"COL_A","COL_B","COL_C"
"ROW1COLA","ROW1COLB","ROW1COLC","ROW1COLD"
"ROW2COLA","ROW2COLB","ROW2COLC","ROW2COLD"
"ROW3COLA","ROW3COLB","ROW3COLC","ROW3COLD"
"ROW4COLA","ROW4COLB","ROW4COLC","ROW4COLD"
"ROW5COLA","ROW5COLB","ROW5COLC","ROW5COLD"
"ROW6COLA","ROW6COLB","ROW6COLC","ROW6COLD"
"ROW7COLA","ROW7COLB","ROW7COLC","ROW7COLD"'''
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), header=None, skiprows=[0])

print df
          0         1         2         3
0  ROW1COLA  ROW1COLB  ROW1COLC  ROW1COLD
1  ROW2COLA  ROW2COLB  ROW2COLC  ROW2COLD
2  ROW3COLA  ROW3COLB  ROW3COLC  ROW3COLD
3  ROW4COLA  ROW4COLB  ROW4COLC  ROW4COLD
4  ROW5COLA  ROW5COLB  ROW5COLC  ROW5COLD
5  ROW6COLA  ROW6COLB  ROW6COLC  ROW6COLD
6  ROW7COLA  ROW7COLB  ROW7COLC  ROW7COLD

关于python - Pandas read_csv,读取缺少标题元素的 csv 文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36828348/

相关文章:

python - 如何对 4D numpy 数组执行迭代 2D 操作

python - 如何计算某些连续日期范围的汇总统计数据

excel - 将数据框从 pandas 导出到 excel 时丢失数据

python - 为什么我的代码显示为困惑而实际上却没有?

python - OSX 上带有 Sublime Text 2 的 MatPlotLib

iphone - 如何在 iPhone 应用程序中使用正则表达式以 , (逗号)分隔字符串

python - 在 Python 中比较黄金标准 csv 文件和提取值 csv 文件

csv - 如何使用 RSqlite 将 sqlite 导出为 CSV?

python - 如何从基于行的字典列表创建 Pandas DataFrame

python - 按元素检查字符串是否存在