c - 使用系统调用在 C moSTLy 中逐行读取文件

我正在尝试逐行读取和解析文件。我只想使用简单的系统调用(read、open、close、...)而不是 fgets 或getc 因为我希望以某种方式学习基础知识。 (我看了一些关于类似问题的答案，但他们都使用 fgets 等)。

这是我目前所拥有的:我编写的一个函数，它将 1024 个字符存储在文件的缓冲区中。

int main(void) {
    const char *filename = "file.txt";
    int fd = open(filename, O_RDONLY);
    char *buffer = malloc(sizeof (char) * 1024); 

    read(fd, buffer, 1024);        
    printf("%s", buffer);
    close(fd);
    free(buffer);    
}

例如，如何在 '\n' 处停下来？我知道一旦我知道在哪里停止，我就可以使用具有正确偏移量的 lseek 继续读取我停止的文件。

我不希望将整个文件存储在我的缓冲区中然后再对其进行解析。我想在我的缓冲区中添加一行，然后解析该行并重新分配我的缓冲区并继续读取文件。

我在想这样的事情，但我觉得它优化得不好，并且不确定之后在哪里添加 lseek:

char *line = malloc(sizeof (char) * 1024);
read(fd, buffer, 1);
int i = 0;
    while(*buffer != '\n' && *buffer != '\0'){
        line[i] = *buffer;
        ++i;
        *buffer++;
        read(fd, buffer, 1); //Assuming i < 1024 and *buffer != NULL
    }


  /* lseek somewhere after, probably should make 2 for loops 
   ** One loop till file isn't completly read
   ** Another loop inside that checks if the end of the line is reached
   ** At the end of second loop lseek to where we left
   */

谢谢:)

编辑:说明标题。

最佳答案

如果您要使用read 一次读取一行(fgets 或getline 的目的)，您在找到每个 '\n' 后，必须跟踪文件中的偏移量。然后只需一次读取一行，在当前行之后的 offset 处开始下一个read。

我理解希望能够使用低级函数以及 fgets 和 getline。您会发现，您基本上最终会重新编码(以一种效率较低的方式)已经在 fgets 和 getline 中完成的工作。但这当然是很好的学习。这是一个简短的例子:

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

#define BUFSZ 128

ssize_t readline (char *buf, size_t sz, char *fn, off_t *offset);

int main (int argc, char **argv) {

    if (argc < 2) return 1;

    char line[BUFSZ] = {0};
    off_t offset = 0;
    ssize_t len = 0;
    size_t i = 0;

    /* using open/read, read each line in file into 'line' */
    while ((len = readline (line, BUFSZ, argv[1], &offset)) != -1)
        printf (" line[%2zu] : %s (%zd chars)\n", i++, line, len);

    return 0;
}

/* read 'sz' bytes from file 'fn' beginning at file 'offset'
   storing all chars  in 'buf', where 'buf' is terminated at
   the first newline found. On success, returns number of
   characters read, -1 on error or EOF with 0 chars read.
 */
ssize_t readline (char *buf, size_t sz, char *fn, off_t *offset)
{
    int fd = open (fn, O_RDONLY);
    if (fd == -1) {
        fprintf (stderr, "%s() error: file open failed '%s'.\n",
                __func__, fn);
        return -1;
    }

    ssize_t nchr = 0;
    ssize_t idx = 0;
    char *p = NULL;

    /* position fd & read line */
    if ((nchr = lseek (fd, *offset, SEEK_SET)) != -1)
        nchr = read (fd, buf, sz);
    close (fd);

    if (nchr == -1) {   /* read error   */
        fprintf (stderr, "%s() error: read failure in '%s'.\n",
                __func__, fn);
        return nchr;
    }

    /* end of file - no chars read
    (not an error, but return -1 )*/
    if (nchr == 0) return -1;

    p = buf;    /* check each chacr */
    while (idx < nchr && *p != '\n') p++, idx++;
    *p = 0;

    if (idx == nchr) {  /* newline not found  */
        *offset += nchr;

        /* check file missing newline at end */
        return nchr < (ssize_t)sz ? nchr : 0;
    }

    *offset += idx + 1;

    return idx;
}

示例输入

除了第二个在每行文本之间包含一个空行之外，以下数据文件是相同的。

$ cat dat/captnjack.txt
This is a tale
Of Captain Jack Sparrow
A Pirate So Brave
On the Seven Seas.

$ cat dat/captnjack2.txt
This is a tale

Of Captain Jack Sparrow

A Pirate So Brave

On the Seven Seas.

输出

$ ./bin/readfile dat/captnjack.txt
 line[ 0] : This is a tale (14 chars)
 line[ 1] : Of Captain Jack Sparrow (23 chars)
 line[ 2] : A Pirate So Brave (17 chars)
 line[ 3] : On the Seven Seas. (18 chars)

$ ./bin/readfile dat/captnjack2.txt
 line[ 0] : This is a tale (14 chars)
 line[ 1] :  (0 chars)
 line[ 2] : Of Captain Jack Sparrow (23 chars)
 line[ 3] :  (0 chars)
 line[ 4] : A Pirate So Brave (17 chars)
 line[ 5] :  (0 chars)
 line[ 6] : On the Seven Seas. (18 chars)

关于c - 使用系统调用在 C moSTLy 中逐行读取文件，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/33106505/

c - 使用系统调用在 C moSTLy 中逐行读取文件

上一篇：c - 猜数字游戏，每个 # 都低

下一篇：c - 如何从数组中单独获取索引