python - 替换输入 Python 的缺失值

假设您的输入格式如下:

id____value1____value2...valueN
1____hello____world...something
2________goodnight...world

4个'_'应该是'/t'

到目前为止，我得到这样的结果:第一项有一个 {ID:1, value1:hello, value2:world,...,valueN:something} 而第二项有{ID:2, value1: , value2:goodnight, ... , valueN: world} 我希望我对第二项的最终表示是:{ID:2, value1:n/a , value2:goodnight, ... , valueN: world}

我已经用 Python 编写了一个脚本来逐行读取文件，但我希望能够检查 '/t' 后面是否跟着另一个 '/t' ，然后插入 'n/a' 值。

到目前为止我的代码是这样的:

def myFunc():
    list = []
        with open(file, 'r') as f:
            header = f.readline()    # Store the header of the file for future reference.(maybe). Don't commend out.
            for line in f:
                for i in range(len(line)):
                    if line[i] == '\t':
                        if line[i+1] == '\t':
                            line[:i] + "n/a" + line[i:]
                list.append(line)   # iterate through the file and store it's values on the list.
        return list

最佳答案

取决于你想在一天结束时如何使用列表，你也可以使用 csv 模块来做一些事情，这对于不止一个的情况会更灵活一些列可能没有条目；

import csv

with open(file, 'r') as f:
    reader = csv.reader(f, delimiter='\t')
    header = next(reader)
    list = [[x if x else 'n/a' for x in line] for line in reader]

现在 list 将是一个列表列表，每个列表都包含实际项目。

In [11]: print(header)
['id', 'value1', 'value2', 'value3']

In [12]: print(list)
[['1', 'hello', 'world', 'something'], ['2', 'n/a', 'goodnight', 'world']]

编辑在下面评论后添加:

对上述方法稍作修改(使用 Python 2.7+ 字典理解)将为您提供字典；

import csv

with open(file, 'r') as f:
    reader = csv.reader(f, delimiter='\t')
    header = next(reader)
    list = [{header[i]: line[i] if line[i] else 'n/a' for i in range(len(header))} for line in reader]

print(list)
# [{'value1': 'hello', 'value3': 'something', 'id': '1', 'value2': 'world'}, {'value1': 'n/a', 'value3': 'world', 'id': '2', 'value2': 'goodnight'}]

你问这是否更干净，这可能在很大程度上取决于你打算如何使用结果。如果您决定检查结果，字典方法会为您提供更易于阅读的内容。

如果您需要对文件执行大量数据处理，您可能会对 pandas DataFrame 数据结构感兴趣对于这种东西。但是，如果您不在那种情况下，那么这种方法可能完全是矫枉过正。几个简单的例子说明它的作用(例如注意它默认处理你原来的 'n/a' 问题):

In [1]: import pandas as pd

In [5]: df = pd.read_csv('testfile', delimiter='\t')  # Or whatever your file is called

In [6]: df = df.set_index('id')

In [7]: df
Out[7]:
   value1     value2     value3
id
1   hello      world  something
2     NaN  goodnight      world

In [8]: df[df['value3'] == 'something']  # Find all rows with a given value3
Out[8]:
   value1 value2     value3
id
1   hello  world  something

In [10]: df[df['value2'] == 'goodnight']  # Find all rows with a given value2
Out[10]:
   value1     value2 value3
id
2     NaN  goodnight  world

In [11]: df['value1']  # Show only value1
Out[11]:
id
1    hello
2      NaN
Name: value1, dtype: object

基本上，您可以想出的任何表操作在 pandas 中都有一种自然的方法。

关于python - 替换输入 Python 的缺失值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38876862/

python - 替换输入 Python 的缺失值

上一篇：python将1970年之前的日期的文件时间转换为日期时间

下一篇：python - 等号后提取