python3通过分隔符将大文件分割成小文件(不是大小，行)

这里是新手。最终任务是学习如何获取两个大 yaml 文件并将它们拆分为数百个小文件。我还没有弄清楚如何使用 ID # 作为文件名，所以一次只做一件事。

第一:将大文件分割成许多。这是我的测试数据文件 test-file.yml 的一小部分。每个帖子都有一个单独的行分隔符:

-
    ID: 627
    more_post_meta_data_and_content
-
    ID: 628

这是我的代码不起作用。到目前为止我不明白为什么:

with open('test-file.yml', 'r') as myfile:
    start = 0
    cntr = 1
    holding = ''
    for i in myfile.read().split('\n'):
        if (i == '-\n'):
            if start==1:
                with open(str(cntr) + '.md','w') as opfile:
                    opfile.write(op)
                    opfile.close()
                    holding=''
                    cntr += 1
            else:
                start=1
        else:
            if holding =='':
                holding = i
            else:
                holding = holding + '\n' + i
    myfile.close()

欢迎所有提示、建议、指点。谢谢。

最佳答案

如果输入文件很大，将整个文件读入内存然后分割内存区域的效率非常低。试试这个:

with open('test-file.yml', 'r') as myfile:
    opfile = None
    cntr = 1
    for line in myfile:
        if line == '-\n':
            if opfile is not None:
                opfile.close()
            opfile = open('{0}.md'.format(cntr),'w')
            cntr += 1
        opfile.write(line)
    opfile.close()

另请注意，您不会关闭在with上下文管理器中打开的内容；上下文管理器的真正目的就是为您处理这个问题。

关于python3通过分隔符将大文件分割成小文件(不是大小，行)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/55074718/

上一篇：python - 如何为 Heroku 上部署的电报机器人编写定期重复的函数

下一篇：python - 从 Visual Studio Code powershell 运行 Anaconda 命令

相关文章：

python-3.x - Python、base64、 float

c++ - 尝试转换 str 字符串以用于 std::getline 定界符

sql - 将地址值拆分为单独的列

python - 如何汇总多个文件中特定列的值

python - 使用python选择mysql中的零日期

python - Django 404 页面未找到 blog.views.post_detail

python - 我的 Python 脚本在同一目录中找不到 JSON 文件

python - 如何解析 'ImmutableDenseNDimArray' 对象没有属性 'could_extract_minus_sign' ？

php - Python 日志记录阻止来自 PHP 脚本的调用

python - 使用 Python 正则表达式在字符之间拆分字符串