python - 恢复嵌套的 for 循环

两个文件。一个数据损坏，另一个已修复。损坏:

ID 0
T5 rat cake
~EOR~
ID 1
T1 wrong segg
T2 wrong nacob
T4 rat tart
~EOR~
ID 3
T5 rat pudding
~EOR~
ID 4
T1 wrong sausag
T2 wrong mspa
T3 strawberry tart 
~EOR~
ID 6
T5 with some rat in it 
~EOR~

修复:

ID 1
T1 eggs
T2 bacon
~EOR~
ID 4
T1 sausage
T2 spam
T4 bereft of loif
~EOR~

EOR 表示记录结束。请注意，损坏的文件比修复文件具有更多记录，修复文件具有要修复的标签(T1、T2 等标签)和要添加的标签。这段代码完全完成了它应该做的事情:

# foobar.py

import codecs

source = 'foo.dat'
target = 'bar.dat' 
result = 'result.dat'  

with codecs.open(source, 'r', 'utf-8_sig') as s, \
     codecs.open(target, 'r', 'utf-8_sig') as t, \
     codecs.open(result, 'w', 'utf-8_sig') as u: 

    sID = ST1 = sT2 = sT4 = ''
    RecordFound = False

    # get source data, record by record
    for sline in s:
        if sline.startswith('ID '):
            sID = sline
        if sline.startswith('T1 '):
            sT1 = sline
        if sline.startswith('T2 '):
            sT2 = sline
        if sline.startswith('T4 '):
            sT4 = sline
        if sline.startswith('~EOR~'):
            for tline in t: 
                # copy target file lines, replacing when necesary
                if tline == sID:
                    RecordFound = True
                if tline.startswith('T1 ') and RecordFound:
                    tline = sT1
                if tline.startswith('T2 ') and RecordFound:
                    tline = sT2 
                if tline.startswith('~EOR~') and RecordFound:
                    if sT4:
                        tline = sT4 + tline
                    RecordFound = False
                    u.write(tline)
                    break

                u.write(tline)

    for tline in t:
        u.write(tline)

我正在写入一个新文件，因为我不想弄乱其他两个文件。第一个外部 for 循环在修复文件中的最后一条记录处完成。此时，目标文件中仍有记录要写入。这就是最后一个 for 子句的作用。

让我烦恼的是，最后一行隐式选取了第一个内部 for 循环上次中断的位置。就好像它应该说“对于t 中tline 的其余部分”。另一方面，我不知道如何用更少(或不多)的代码行(使用字典和你拥有的东西)来做到这一点。我应该担心吗？

请评论。

最佳答案

我不会担心。在您的示例中，t 是一个文件句柄，您正在迭代它。 Python 中的文件句柄是它们自己的迭代器；它们具有有关在文件中读取的位置的状态信息，并且在您迭代它们时将保留它们的位置。您可以查看 python 文档 file.next()了解更多信息。

另请参阅另一个也讨论迭代器的 SO 答案:What does the "yield" keyword do in Python? 。这里有很多有用的信息!

编辑:这是使用字典组合它们的另一种方法。如果您想在输出之前对记录进行其他修改，则可能需要此方法:

import sys

def get_records(source_lines):
    records = {}
    current_id = None
    for line in source_lines:
        if line.startswith('~EOR~'):
            continue
        # Split the line up on the first space
        tag, val = [l.rstrip() for l in line.split(' ', 1)]
        if tag == 'ID':
            current_id = val
            records[current_id] = {}
        else:
            records[current_id][tag] = val
    return records

if __name__ == "__main__":
    with open(sys.argv[1]) as f:
        broken = get_records(f)
    with open(sys.argv[2]) as f:
        fixed = get_records(f)

    # Merge the broken and fixed records
    repaired = broken
    for id in fixed.keys():
        repaired[id] = dict(broken[id].items() + fixed[id].items())

    with open(sys.argv[3], 'w') as f:
        for id, tags in sorted(repaired.items()):
            f.write('ID {}\n'.format(id))
            for tag, val in sorted(tags.items()):
                f.write('{} {}\n'.format(tag, val))
            f.write('~EOR~\n')

dict(broken[id].items() +fixed[id].items()) 部分利用了这一点: How to merge two Python dictionaries in a single expression?

关于python - 恢复嵌套的 for 循环，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20271235/

python - 恢复嵌套的 for 循环

上一篇：python urllib2.HTTPError : HTTP Error 403: Forbidden

下一篇：python - 如何在 Pygame 中制作弹出式径向菜单？