c - 逐行读取文件，包括多个换行符

我正在尝试逐行读取未知大小的文件，包括单个或多个换行符。例如，如果我的sample.txt 文件如下所示

abc   cd er  dj
text

more   text


zxc cnvx

我希望我的字符串看起来像这样

string1 = "abc   cd er  dj\n";
string2 = "text\n\n";
string3 = "more   text\n\n\n";
string4 = "zxc convex";

我似乎无法想出可以正常工作的解决方案。我尝试使用以下代码来获取每行的长度(包括换行符)，但它给出的长度不正确

while((temp = fgetc(input)) != EOF) {
    if (temp != '\n') {
        length++;
    }
    else {
        if (temp == '\n') {
            while ((temp = fgetc(input)) == '\n') {
                length++;
            }
        }
        length = 0;
    } 
}

我在想，如果我可以获得包括换行符在内的每行的长度，然后我可以 malloc 该长度的字符串，然后使用 fread 读取该大小的字符串，但我不确定这是否有效，因为我必须移动文件指针才能获取下一个字符串。

我也不想使用缓冲区，因为我不知道每行的长度。任何形式的帮助将不胜感激。

最佳答案

如果行很短并且行数不多，您可以使用 realloc 根据需要重新分配内存。或者您可以使用较小(或较大)的 block 并重新分配。这有点浪费，但希望最终能达到平均水平。

如果您只想使用一个分配，则找到下一个非空行的开头并保存文件位置(使用ftell)。然后获取当前位置和前一个起始位置之间的差异，就知道要分配多少内存。对于读取，是的，您必须来回查找，但如果不是很大，所有数据都将在缓冲区中，而只是修改一些指针。读取后寻找保存的位置并将其作为下一个起始位置。

那么您当然可以memory-map the file 。这会将文件内容放入内存映射中，就像已全部分配一样。对于 64 位系统，地址空间足够大，因此您应该能够映射数 GB 文件。那么你不需要寻找或分配内存，你所做的只是操作指针而不是寻找。读取只是简单的内存复制(但是由于文件已经“在”内存中，所以您实际上并不需要它，只需保存指针即可)。

<小时/>

关于 fseek 的非常简单示例和 ftell ，这与你的问题有些相关，我为你整理了这个小程序。它实际上并没有做任何特别的事情，但它展示了如何以可用于我上面讨论的第二种方法的原型(prototype)的方式使用这些函数。

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    FILE *file = fopen("some_text_file.txt", "r");

    // The position after a successful open call is always zero
    long start_of_line = 0;

    int ch;

    // Read characters until we reach the end of the file or there is an error
    while ((ch = fgetc(file)) != EOF)
    {
        // Hit the *first* newline (which differs from your problem)
        if (ch == '\n')
        {
            // Found the first newline, get the current position
            // Note that the current position is the position *after* the newly read newline
            long current_position = ftell(file);

            // Allocate enough memory for the whole line, including newline
            size_t bytes_in_line = current_position - start_of_line;
            char *current_line = malloc(bytes_in_line + 1);  // +1 for the string terminator

            // Now seek back to the start of the line
            fseek(file, start_of_line, SEEK_SET);  // SEEK_SET means the offset is from the beginning of the file

            // And read the line into the buffer we just allocated
            fread(current_line, 1, bytes_in_line, file);

            // Terminate the string
            current_line[bytes_in_line] = '\0';

            // At this point, if everything went well, the file position is
            // back at current_position, because the fread call advanced the position
            // This position is the start of the next line, so we use it
            start_of_line = current_position;

            // Then do something with the line...
            printf("Read a line: %s", current_line);

            // Finally free the memory we allocated
            free(current_line);
        }

        // Continue loop reading character, to read the next line
    }

    // Did we hit end of the file, or an error?
    if (feof(file))
    {
        // End of the file it is

        // Now here's the tricky bit. Because files doesn't have to terminated
        // with a newline, at this point we could actually have some data we
        // haven't read. That means we have to do the whole thing above with
        // the allocation, seeking and reading *again*

        // This is a good reason to extract that code into its own function so
        // you don't have to repeat it

        // I will not repeat the code my self. Creating a function containing it
        // and calling it is left as an exercise
    }

    fclose(file);

    return 0;
}

请注意，为了简洁起见，该程序不包含任何错误处理。还应该指出的是，我实际上并没有尝试过该程序，甚至没有尝试编译它。这都是专门为这个答案而写的。

关于c - 逐行读取文件，包括多个换行符，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39934530/

c - 逐行读取文件，包括多个换行符

上一篇：C API : why are functions using returnParameters with buffer + size instead of returning char*

下一篇：c - 用 for 填充二维数组时，C 中的无限循环