python - 使用 python 从 csv 数据集中消除不存在/空值

我有以下脚本用于验证我拥有的 csv 文件，然后才能将其呈现在我正在创建的 d3.js 可视化上:

import csv

num_headers = 9

def url_escaper(data):
  for line in data:
    yield line.replace('&','&amp;')

with open("adzuna_input.csv", 'r') as file_in, open("adzuna_output.csv", 'w') as file_out:
    csv_in = csv.reader(url_escaper(file_in))
    csv_out = csv.writer(file_out)
    for i, row in enumerate(csv_in):

        if len(row) == num_headers:
            csv_out.writerow(row)
        else:
            print "line %d is malformed" % i

如您所见，我消除了字符 & 将其替换为转义的 &，我尝试对空格或空值执行类似的操作，但它不是不是很有效。

我认为最好确定是否有任何列值完全由空格组成或完全为空，然后放弃该数据索引，就像我处理太长或太短的数据一样格式错误。

我被困在如何执行这个需求的逻辑上，会是这样吗

for i, row in enumerate(csv_in):
    if i is null || whitespace:
        print "line %d is malformed" % i

添加

我尝试过这样的:

for i, row in enumerate(csv_in, starts):

    if row.strip() & len(row) == num_headers:
        csv_out.writerow(row)
    else:
        print "line %d is malformed" % i

但它说“AttributeError:'list'对象没有属性'strip'”

我的输入数据如下所示

http://www.edsa-project.eu/adzuna/eyJhbGciOiJIUzI1NiJ9.eyJzIjoia0EtLWlpVHhUMUNtSFM0SzE4TUVzUSIsImkiOiIzMzI2OTEyMjIifQ.qK3xtYQDxRpKJkNargPu6Jef4njm2fSZnNIVulRHoqA,Software Development Manager,Spring Technology ,Woolstone,52.042198,&,&,&,1
http://www.edsa-project.eu/adzuna/eyJhbGciOiJIUzI1NiJ9.eyJzIjoia0EtLWlpVHhUMUNtSFM0SzE4TUVzUSIsImkiOiIzMzI4NDM1MzgifQ.pYnBX-APPdB3edTRC_M8x6usmBq_GfIxcdZOXSLJN04,Data Scientists Python R Scala Java or Matlab,Aspire Data Recruitment,    ,,,United Kingdom,data science|java|python|scala|matlab|analysis,1

^上述测试文件在第二个示例中包含错误 - 由于空行，第二条记录应被拒绝

最佳答案

使用strip检查列表中是否有任何元素为空:

for i, row in enumerate(csv_in, start=1):
    if not [e for e in row if not e.strip()]:
        if len(row) == num_headers:
            csv_out.writerow(row)
    else:
        print "line %d is malformed" % i

如果元素不为空，

strip() 将为 True。因此，if not e.strip() 将给出 False 的否定，即 True，我们将打印出 "line %d 格式错误"% i

string.strip(s[, chars])

Return a copy of the string with leading and trailing characters removed. If chars is omitted or None, whitespace characters are removed. If given and not None, chars must be a string; the characters in the string will be stripped from the both ends of the string this method is called on.

示例文件，第二行为空，其旁边的行仅由空格组成:

test


test

输出:

line 2 is malformed
line 3 is malformed

请注意enumerate默认从 0 开始。所以我会指定 start=1 参数来获取正确的行号。

关于python - 使用 python 从 csv 数据集中消除不存在/空值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35138578/

python - 使用 python 从 csv 数据集中消除不存在/空值

上一篇：python - 多索引填充数据框

下一篇：python - TensorFlow 的参数无效错误(形状不兼容)