python - file.tell() 不一致

有没有人碰巧知道为什么要以这种方式遍历文件:

输入:

f = open('test.txt', 'r')
for line in f:
    print "f.tell(): ",f.tell()

输出:

f.tell(): 8192
f.tell(): 8192
f.tell(): 8192
f.tell(): 8192

我一直从 tell() 获取错误的文件索引，但是，如果我使用 readline，我会为 tell() 获取适当的索引:

输入:

f = open('test.txt', 'r')
while True:
    line = f.readline()
    if (line == ''):
        break
    print "f.tell(): ",f.tell()

输出:

f.tell(): 103
f.tell(): 107
f.tell(): 115
f.tell(): 124

顺便说一句，我正在运行 python 2.7.1。

最佳答案

将打开的文件用作迭代器使用预读缓冲区来提高效率。因此，当您遍历这些行时，文件指针会在文件中大步前进。

来自File Objects文档:

In order to make a for loop the most efficient way of looping over the lines of a file (a very common operation), the next() method uses a hidden read-ahead buffer. As a consequence of using a read-ahead buffer, combining next() with other file methods (like readline()) does not work right. However, using seek() to reposition the file to an absolute position will flush the read-ahead buffer.

如果您需要依赖.tell()，请不要将文件对象用作迭代器。您可以将 .readline() 改为迭代器(以牺牲一些性能为代价):

for line in iter(f.readline, ''):
    print f.tell()

这使用了 iter() function sentinel 参数将任何可调用对象转换为迭代器。

关于python - file.tell() 不一致，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/14341084/

上一篇：python - 如何确定多个外键中SQL插入的顺序？

下一篇：python - mod_python 没有名为 _apache 的模块

相关文章：

c# - 如何向外部证明数据文件未被篡改？

python - 读取二进制文件并遍历每个字节

python - 使用 tf.data.Dataset 使保存的模型更大

python - 在 Python 中获取多个元组列表的第二个元素的交集的简单有效的方法？

python - 使用python排列文本文件

python - 无法使用 python PDFKIT 错误 : "No wkhtmltopdf executable found:" 创建 pdf

python - Pandas 数据框，将 3 列分组并计算第三列

python - 使用 pip 和 virtualenv 逐步设置 python？

python - 从一个 txt 文件中选取部分并使用 python 复制到另一个文件

python - 如何使用 pandas 识别 CSV 文件中的空单元格