python - 保存 readlines 数组需要 RAM 吗？

我正在使用命令 lineslist = file.readlines() 2GB 文件。

所以，我猜它会创建一个 2GB 或更大大小的 lineslist 数组。那么，基本上它与 readfile = file.read() 相同吗？，它还创建了 2GB 的 readfile (实例/变量？)？

为什么在这种情况下我应该更喜欢阅读行？

除此之外，我还有一个问题，这里也提到了 https://docs.python.org/2/tutorial/inputoutput.html :

readline(): a newline character (\n) is left at the end of the string, and is only omitted on the last line of the file if the file doesn’t end in a newline. This makes the return value unambiguous;

我不明白最后一点。那么，readlines() 也是如此吗？如果没有\n，则其数组的最后一个元素也具有明确的值在文件末尾？

我们正在处理合并文件(根据 block 大小分割)所以，我正在考虑选择 readlines 或 read。由于各个文件可能不会以 \n 结尾。分割后，如果 readlines返回明确的值，我认为这将是一个问题。)

PS:我还没学过python。所以，如果 python 中没有实例这样的东西或者我在说垃圾，请原谅我。我只是假设。

编辑:

好的，我刚刚发现。它不会返回任何明确的输出。

len(lineslist)
6923798
lineslist[6923797]
"\xf4\xe5\xcf1)\xff\x16\x93\xf2\xa3-\....\xab\xbb\xcd"

所以，它不以“\n”结尾。但它也不是明确的输出。

此外，readline 没有明确的输出要么是最后一行。

最佳答案

如果我正确理解您的问题，您只是想要合并(即连接)文件。

如果内存是一个问题，通常 for line in f 是可行的方法。

我尝试使用 1.9GB csv 文件进行基准测试。一种可能的替代方案是读取适合内存的大块数据。

代码:

#read in large chunks - fastest in my test
chunksize = 2**16
with open(fn,'r') as f:
    chunk = f.read(chunksize)
    while chunk:
        chunk = f.read(chunksize)
#1 loop, best of 3: 4.48 s per loop

#read whole file in one go - slowest in my test
with open(fn,'r') as f:
    chunk = f.read()
#1 loop, best of 3: 11.7 s per loop

#read file using iterator over each line - most practical for most cases
with open(fn,'r') as f:
    for line in f:
        s = line
#1 loop, best of 3: 6.74 s per loop

知道这一点你可以写一些类似的东西:

with open(outputfile,'w') as fo:
    for inputfile in inputfiles: #assuming inputfiles is a list of filepaths
        with open(inputfile,'r') as fi:
            for chunk in iter(lambda: fi.read(chunksize), ''):
                fo.write(fi.read(chunk))
            fo.write('\n') #newline between each file(might not be necessary)

关于python - 保存 readlines 数组需要 RAM 吗？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36469680/

python - 保存 readlines 数组需要 RAM 吗？

上一篇：python - 从python中的请求查询结果中提取一个json字段

下一篇：Python memoryerror 创建大字典