python - 在Python中重新读取文件的最快方法？

我有一个文件，其中包含名称列表及其位置(开始 - 结束)。

我的脚本迭代该文件，并根据名称读取另一个包含信息的文件，以检查该行是否位于这些位置之间，然后从中计算出一些内容。

目前，它逐行读取整个第二个文件(60MB)，检查它是否在开始/结束之间。对于第一个列表中的每个名称(大约 5000 个)。收集这些参数之间的数据而不是重新读取整个文件 5000 次的最快方法是什么？

第二个循环的示例代码:

for line in file:
    if int(line.split()[2]) >= start and int(line.split()[2]) <= end:
        Dosomethingwithline():

编辑:将文件加载到第一个循环上方的列表中并迭代以提高速度。

with open("filename.txt", 'r') as f:
    file2 = f.readlines()
for line in file:
    [...]
    for line2 in file2:
       [...]

最佳答案

您可以使用mmap module将该文件加载到内存中，然后进行迭代。

示例:

import mmap

# write a simple example file
with open("hello.txt", "wb") as f:
    f.write(b"Hello Python!\n")

with open("hello.txt", "r+b") as f:
    # memory-map the file, size 0 means whole file
    mm = mmap.mmap(f.fileno(), 0)
    # read content via standard file methods
    print(mm.readline())  # prints b"Hello Python!\n"
    # read content via slice notation
    print(mm[:5])  # prints b"Hello"
    # update content using slice notation;
    # note that new content must have same size
    mm[6:] = b" world!\n"
    # ... and read again using standard file methods
    mm.seek(0)
    print(mm.readline())  # prints b"Hello  world!\n"
    # close the map
    mm.close()

关于python - 在Python中重新读取文件的最快方法？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/28108972/

上一篇：Python - 服务器和浏览器客户端

下一篇：python - Flask-Babel 不会翻译 Web 项目中的任何内容

node.js - 具有自定义属性的文件对象

python - 捕获异常并立即再次引发它有什么值(value)吗？

python - 无法在 Windows 上使用带有 Python 2.7.3 的 SQLAlchemy 连接到内存中的 SQLite 数据库

python - 如何使用正则表达式删除 python pandas DataFrame 中的行？

python - 在最近的关键条件下加入 Spark DataFrames

java - GridView 加载大图片资源慢

python - 有什么方法可以替换 break 以提前退出 for 循环吗？

sql - 表锁会加速 Oracle 10g 企业版中的更新语句吗？

linux - 查找与软件相关的文件