C:使用 fread()/fgets() 而不是 fgetc() 逐行读取文本文件(具有可变长度行)( block I/O 与字符 I/O)

是否有 getline 函数使用 fread( block I/O)而不是 fgetc(字符 I/O)？

通过 fgetc 逐字符读取文件会降低性能。我们认为为了提高性能，我们可以在getline的内循环中通过fread来使用block reads。然而，这引入了阅读超过行尾的潜在不良影响。至少，这需要执行 getline 来跟踪文件的“未读”部分，这需要超出 ANSI C FILE 语义的抽象。这不是我们想要自己实现的东西!

我们分析了我们的应用程序，缓慢的性能与我们通过 fgetc 一个字符一个字符地处理大文件这一事实无关。相比之下，其余的开销实际上是微不足道的成本。我们总是按顺序读取文件的每一行，从开始到结束，我们可以在读取期间锁定整个文件。这可能会使基于 fread 的 getline 更容易实现。

那么，是否存在使用 fread( block I/O)而不是 fgetc(字符 I/O)的 getline 函数？我们非常确定它确实如此，但如果不是，我们应该如何实现它？

更新找到有用的文章，Handling User Input in C ，作者:谢保。这是一种基于 fgetc 的方法，但它对备选方案进行了有趣的讨论(从 gets 有多糟糕开始，然后讨论 fgets):

On the other hand the common retort from C programmers (even those considered experienced) is to say that fgets() should be used as an alternative. Of course, by itself, fgets() doesn't really handle user input per se. Besides having a bizarre string termination condition (upon encountering \n or EOF, but not \0) the mechanism chosen for termination when the buffer has reached capacity is to simply abruptly halt the fgets() operation and \0 terminate it. So if user input exceeds the length of the preallocated buffer, fgets() returns a partial result. To deal with this programmers have a couple choices; 1) simply deal with truncated user input (there is no way to feed back to the user that the input has been truncated, while they are providing input) 2) Simulate a growable character array and fill it in with successive calls to fgets(). The first solution, is almost always a very poor solution for variable length user input because the buffer will inevitably be too large most of the time because its trying to capture too many ordinary cases, and too small for unusual cases. The second solution is fine except that it can be complicated to implement correctly. Neither deals with fgets' odd behavior with respect to '\0'.

Exercise left to the reader: In order to determine how many bytes was really read by a call to fgets(), one might try by scanning, just as it does, for a '\n' and skip over any '\0' while not exceeding the size passed to fgets(). Explain why this is insufficient for the very last line of a stream. What weakness of ftell() prevents it from addressing this problem completely?

Exercise left to the reader: Solve the problem determining the length of the data consumed by fgets() by overwriting the entire buffer with a non-zero value between each call to fgets().

So with fgets() we are left with the choice of writing a lot of code and living with a line termination condition which is inconsistent with the rest of the C library, or having an arbitrary cut-off. If this is not good enough, then what are we left with? scanf() mixes parsing with reading in a way that cannot be separated, and fread() will read past the end of the string. In short, the C library leaves us with nothing. We are forced to roll our own based on top of fgetc() directly. So lets give it a shot.

那么，是否存在基于 fgets(并且不截断输入)的 getline 函数？

最佳答案

不要使用fread。使用 fgets。我认为这是一个家庭作业/类项目问题，所以我没有提供完整的答案，但如果你说不是，我会提供更多建议。完全可以使用 fgets 提供 GNU 风格 getline 的 100% 语义，包括嵌入的空字节，但这需要一些巧妙的思考。

好的，更新一下，因为这不是家庭作业:

memset 您的缓冲区到 '\n'。
使用 fgets。
使用memchr 找到第一个'\n'。
如果未找到 '\n'，则该行比您的缓冲区长。放大缓冲区，用 '\n' 填充新部分，并将 fgets 放入新部分，必要时重复。
如果 '\n' 后面的字符是 '\0'，则 fgets 会因到达行尾而终止。
否则，fgets 由于到达 EOF 而终止，'\n' 是您的 memset 遗留下来的，前一个字符是终止fgets写的null，之前的字符是实际读取数据的最后一个字符。

如果您不关心支持带有嵌入空值的行(或者方式，null 不会终止阅读；它只是您读入行的一部分)。

还有一种方法可以用 fscanf 和 "%123[^\n]" 说明符(其中 123是您的缓冲区限制)，这使您可以灵活地在非换行符处停止(ala GNU getdelim)。但是它可能很慢，除非你的系统有一个非常奇特的 scanf 实现。

关于C:使用 fread()/fgets() 而不是 fgetc() 逐行读取文本文件(具有可变长度行)( block I/O 与字符 I/O)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/4411345/

C:使用 fread()/fgets() 而不是 fgetc() 逐行读取文本文件(具有可变长度行)( block I/O 与字符 I/O)

上一篇：c - mbrtowc 的 s==NULL 案例的目的是什么？

下一篇：c - 高效的 C 池分配器？