python - pandas read_csv skiprows - 确定要跳过的行

标签 python pandas

下面是一个带有一些虚拟标题的 csv 片段,而实际框架由 beerId 锚定:

This work is an unpublished, copyrighted work and contains confidential information.  
beer consumption    
consumptiondate 7/24/2018
consumptionlab  H1
numbeerssuccessful  40
numbeersfailed  0
totalnumbeers   40
consumptioncomplete TRUE

beerId  Book
341027  Northern Light

df = pd.read_csv(path_csv, header=8) 代码有效,但问题是 header 并不总是在 8 中,具体取决于一天。无法弄清楚如何使用 help 中的 lambda

skiprows : list-like or integer or callable, default None

Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file.

If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be lambda x: x in [0, 2].

找到beerId的索引行

最佳答案

我认为需要先进行预处理:

path_csv = 'file.csv'
with open(path_csv) as f:
    lines = f.readlines()
    #get list of all possible lins starting by beerId
    num = [i for i, l in enumerate(lines) if l.startswith("beerId" )]
    #if not found value return 0 else get first value of list subtracted by 1
    num = 0 if len(num) == 0 else num[0] - 1
    print (num)
    8


df = pd.read_csv(path_csv, header=num)
print (df)
             beerId  Book
0  341027  Northern Light

关于python - pandas read_csv skiprows - 确定要跳过的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51530785/

相关文章:

python - 在 Mac 上使用 pygame?

Python - 用户输入数据类型

python - 从 Yahoo! 下载 future 价格系列与 Pandas

python - 如何重新索引多索引 pandas 数据框?

python - 使用 Pandas 时 dateutil.tz 包显然丢失了?

python - 现代 glTranslate 和 glRotate 的替代品是什么?

python - PerformanceWarning : dropping on a non-lexsorted multi-index without a level parameter may impact performance. 如何摆脱它?

python - 在Tensorflow 2.0中的tf.function input_signature中使用字典

python - 在 pandas 中创建滚动协方差矩阵

python - 如何根据另一列中的条件转置一列?