Python 读取大型文本文件(几 GB)的最快方法

标签 python performance optimization line chunking

我有一个大文本文件 (~7 GB)。我正在寻找是否存在读取大文本文件的最快方法。我一直在阅读有关使用多种方法逐 block 读取以加快进程的信息。

例如 effbot建议

# File: readline-example-3.py

file = open("sample.txt")

while 1:
    lines = file.readlines(100000)
    if not lines:
        break
    for line in lines:
        pass # do something**strong text**

为了每秒处理 96,900 行文本。其他 authors建议使用 islice()

from itertools import islice

with open(...) as f:
    while True:
        next_n_lines = list(islice(f, n))
        if not next_n_lines:
            break
        # process next_n_lines

list(islice(f, n)) 将返回文件 f 的下 n 行的列表。在循环中使用它会给你提供 n 行

block 的文件

最佳答案

with open(<FILE>) as FileObj:
    for lines in FileObj:
        print lines # or do some other thing with the line...

每次将一行读入内存，完成后关闭文件...

关于Python 读取大型文本文件(几 GB)的最快方法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/14944183/

上一篇：python - 如何修复 Pylint "Wrong hanging indentation"和 PEP8 E121？

下一篇：python - 使用 Twisted 进行非阻塞文件访问

相关文章：

一旦我导入任何 PyQt5 模块，Python.exe 就会停止工作

python - 在 python 中执行列表扩充赋值 (+=) 的动机是什么？

arrays - Julia - 将矩阵转换为向量

javascript - 生成不在颜色数组中的颜色的最快方法

MySQL 查询优化(运行 7.6 秒)

python - App Engine 中是否有 "children"方法？

python - 获取 Flask 请求中收到的数据

linux - linux中计算多个进程cpu使用率的有效方法

php - 从慢速查询日志提高 MySQL 查询性能

python - 安装/运行 Pyomo(未找到命令)