Python - 为单个文件的每个部分编写单独的文件

标签 python python-2.7 parsing

我有一个包含 5 部分数据的 .txt 文件。每个部分都有一个标题行“X 部分”。我想从这个文件中解析并写入 5 个单独的文件。该节将从标题开始,并在下一个节标题之前结束。下面的代码创建 5 个单独的文件;然而,它们都是空白的。

from itertools import cycle

filename = raw_input("Which file?: \n")

dimensionsList = ["Section 1", "Section 2",
    "Section 3", "Section 4", "Section 5"]

with open(filename+".txt", "rb") as oldfile:
    for i in dimensionsList:
        licycle = cycle(dimensionsList)
        nextelem = licycle.next()
        with open(i+".txt", "w") as newfile: 
            for line in oldfile:
                if line.strip() == i:
                    break
            for line in oldfile:
                if line.strip() == nextelem:
                    break
                newfile.write(line)

最佳答案

问题

测试你的代码,它只适用于第 1 部分(其他部分对我来说也是空白的)。我意识到问题在于各部分之间的转换(并且 licycle 在所有迭代中重新启动)。

第 2 部分在第二个 for 处读取(if line.strip() == nextelem)。下一行是第 2 部分的数据(而不是文本 Section 2)。

这很难说,但是测试下面的代码:

from itertools import cycle

filename = raw_input("Which file?: \n")

dimensionsList = ["Section 1", "Section 2", "Section 3", "Section 4",
                  "Section 5"]

with open(filename + ".txt", "rb") as oldfile:
    licycle = cycle(dimensionsList)
    nextelem = licycle.next()
    for i in dimensionsList:
        print(nextelem)
        with open(i + ".txt", "w") as newfile:
            for line in oldfile:
                print("ignoring %s" % (line.strip()))
                if line.strip() == i:
                    nextelem = licycle.next()
                    break
            for line in oldfile:
                if line.strip() == nextelem:
                    # nextelem = licycle.next()
                    print("ignoring %s" % (line.strip()))
                    break
                print("printing %s" % (line.strip()))
                newfile.write(line)
            print('')

它将打印:

Section 1
ignoring Section 1
printing aaaa
printing bbbb
ignoring Section 2

Section 2
ignoring ccc
ignoring ddd
ignoring Section 3
ignoring eee
ignoring fff
ignoring Section 4
ignoring ggg
ignoring hhh
ignoring Section 5
ignoring iii
ignoring jjj

Section 2

Section 2

Section 2

它适用于第 1 部分,它检测到第 2 部分,但它一直忽略这些行,因为它找不到“第 2 部分”。

如果每次重新启动生产线(总是从第 1 行开始),我认为该程序会起作用。但我编写了一个更简单的代码,应该对您有用。

解决方案

from itertools import cycle

filename = raw_input("Which file?: \n")

dimensionsList = ["Section 1", "Section 2", "Section 3", "Section 4",
                  "Section 5"]

with open(filename + ".txt", "rb") as oldfile:

    licycle = cycle(dimensionsList)
    nextelem = licycle.next()
    newfile = None
    line = oldfile.readline()

    while line:

        # Case 1: Found new section
        if line.strip() == nextelem:
            if newfile is not None:
                newfile.close()
            nextelem = licycle.next()
            newfile = open(line.strip() + '.txt', 'w')

        # Case 2: Print line to current section
        elif newfile is not None:
            newfile.write(line)

        line = oldfile.readline()

如果找到该部分,它将开始写入这个新文件。否则,继续写入当前文件。

PS:下面是我使用的示例文件:

Section 1
aaaa
bbbb
Section 2
ccc
ddd
Section 3
eee
fff
Section 4
ggg
hhh
Section 5
iii
jjj

关于Python - 为单个文件的每个部分编写单独的文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44146586/

相关文章:

python - KVM api启动虚拟机

python - 玛雅/Python : Creating Unique Commands in Loop-Generated UI

python - 安装适用于 Python 的 MATLAB 引擎

python - 没有默认值的命名参数?

Python httplib2 "httplib2.SSLHandshakeError"

android - 解析 JSON,数组中的数组 (Android)

java - 从 html 文档中解析 href 并通过 xpath 求值返回空指针异常

Python: "foo() for i in range(bar)"是什么意思?

python-2.7 - Tensorflow saver.restore() 不恢复网络

python - 寻找最大的重复子串