c - 标记外部文件

标签 c struct dynamic-memory-allocation strtok strcpy

所以我一直纠结于如何标记第一个标记并将该值放入结构中。在我的例子中,我试图从如下文件中读取行:
TDV格式:

 TN     1424325600000   dn20t1kz0xrz    67.0    0.0  0.0     0.0    101872.0    262.5665
 TN     1422770400000   dn2dcstxsf5b    23.0    0.0  100.0   0.0    100576.0    277.8087
 TN     1422792000000   dn2sdp6pbb5b    96.0    0.0  100.0   0.0    100117.0    278.49207
 TN     1422748800000   dn2fjteh8e80    6.0     0.0  100.0   0.0    100661.0    278.28485
 TN     1423396800000   dn2k0y7ffcup    14.0    0.0  100.0   0.0    100176.0    282.02142 

如您所见,有一个TN表示一个州的代码。在下面的函数中,我需要能够识别一行用于特定状态并将其发送到结构。
这是一个函数,我应该在这里做这个。我已经评论了在这个函数中需要做的事情的列表。我以为我做得对,但当我把它打印出来时,发现实际上发生了完全不同的事情:
void analyze_file(FILE *file, struct climate_info **states, int num_states)
{
    const int line_sz = 100;
    char line[line_sz];
    int counter = 0;
    char *token;

    while (fgets(line, line_sz, file) != NULL)
    {
        /* TODO: We need to do a few things here:
         *
         *       * Tokenize the line.
         *       * Determine what state the line is for. This will be the state
         *         code, stored as our first token.
         *       * If our states array doesn't have a climate_info entry for
         *         this state, then we need to allocate memory for it and put it
         *         in the next open place in the array. Otherwise, we reuse the
         *         existing entry.
         *       * Update the climate_info structure as necessary.
         */
        struct climate_info *states = malloc(sizeof(struct climate_info)*num_states);
        token = strtok(line," \n");
        strcpy(states->code, token);
        //printf("token: %s\n", token);

        while(token)
        {

            printf("token: %s\n", token);
            token = strtok(NULL, " \t");

        }
    }
    printf("%d\n",counter);

}

这是我定义的结构:
struct climate_info
{
    char code[3];
    unsigned long num_records;
    long long millitime;
    char location[13];
     double humidity;
    double snow;
    double cloud;
    double lightning;
    long double pressure;
     double temperature;
};

这里是我打印输出的地方,这是我的程序似乎无法识别在analyze_file函数中正在执行的操作的地方:
void print_report(struct climate_info *states[], int num_states)
{
    printf("States found: ");
    int i;
    for (i = 0; i < num_states; ++i)
    {
        if (states[i] != NULL)
        {
            struct climate_info *info = states[i];
            printf("%s", info->code);
        }
    }
    printf("\n");

输出应该如下所示:States found:TN
我能够标记我的字符串并输出每行的每一个标记,但是问题是当我试图给出结构值时。在analyze_file:strcpy(states->code,token)中的一行中,我试图获取我所知道的第一个标记,即状态代码,并将它赋给我从结构创建的已分配空间。从我的print_report函数中可以看到,它似乎没有意识到我正在向气候信息发送值。我的问题是如何在不更改打印报告功能的情况下修复分析文件功能。

最佳答案

您在试图找出如何使用"TN"时遇到的困难很大程度上源于您试图将每一行中读取的所有数据存储在一个单独的结构中。如注释中所述,对于将数据读入数据库(数据库提供了按状态缩写查询所有记录的能力)来说,这可能是很好的,但会使处理数据变得有点困难。为什么?
当您将所有记录存储为单个结构时,数据所属的状态与存储的信息之间没有关系,只有结构的code成员。这意味着,如果您希望搜索或打印的信息,例如"TN"您必须遍历每个stuct,检查code成员是否匹配"TN"。想想打印。必须为每个状态循环,然后每次循环遍历每个结构以选择要打印的单个状态的信息。
与其将每个信息记录存储为记录数组中的元素,不如使用一个状态数组,其中每个状态都包含指向该状态的数据的指针。这将使您的num_records成员更有意义。然后,您只需循环遍历状态数组,检查是否(num_records > 0),然后打印该状态的num_records值信息,同时跳过所有未存储数据的状态。这提供了一种更有效的方法。
例如,稍微重新排列结构以提供状态和与该状态关联的数据之间的关系所需的工作量很小,例如:

#include <stdio.h>
#include <stdlib.h>

/* if you need constants, either #define them or use an enum */
enum { ABRV = 2, NDATA = 8, LOC = 13, NAME = 15, MAXC = 1024 };
...
typedef struct {            /* struct holding only climate data */
    long long millitime;
    char location[LOC];
    double humidity;
    double snow;
    double cloud;
    double lightning;
    long double pressure;
    double temperature;
} climate_t;

typedef struct {
    size_t  num_allocated,  /* track of how many data are allocated */
            num_records;
    climate_t *data;        /* a pointer to allocated block for data */
} statedata_t;

但是如何将从文件中读取"TN"与获取以正确状态存储的数据联系起来呢?这就是查找表的位置。如果您有另一个包含状态名和缩写的简单结构,您可以创建一个包含缩写信息的简单结构数组,当您从文件中读取时(例如"TN"),您可以简单地“查找”包含缩写的数组中"TN"所在的索引,然后使用该索引将该行中的信息存储在statedata_t数组中相应的索引处。
因为“lookup array”是常量,所以它可以是声明为const的全局数组。如果使用多个源文件,只需在一个文件中定义数组,并在其余需要它的文件中将其声明为extern。那你怎么定义呢?首先用查找中需要的信息(州名和缩写)声明一个sturct,然后声明一个由它们组成的常量数组,初始化每个名称和缩写,例如。
typedef struct {
    char name[NAME+1],
        abrv[ABRV+1];
} stateabrv_t;
...
const stateabrv_t state[]  =  { { "Alabama",        "AL" },
                                { "Alaska",         "AK" },
                                { "Arizona",        "AZ" },
                                { "Arkansas",       "AR" },
                                { "California",     "CA" },
                                { "Colorado",       "CO" },
                                { "Connecticut",    "CT" },
                                { "Delaware",       "DE" },
                                { "Florida",        "FL" },
                                { "Georgia",        "GA" },
                                { "Hawaii",         "HI" },
                                { "Idaho",          "ID" },
                                { "Illinois",       "IL" },
                                { "Indiana",        "IN" },
                                { "Iowa",           "IA" },
                                { "Kansas",         "KS" },
                                { "Kentucky",       "KY" },
                                { "Louisiana",      "LA" },
                                { "Maine",          "ME" },
                                { "Maryland",       "MD" },
                                { "Massachusetts",  "MA" },
                                { "Michigan",       "MI" },
                                { "Minnesota",      "MN" },
                                { "Mississippi",    "MS" },
                                { "Missouri",       "MO" },
                                { "Montana",        "MT" },
                                { "Nebraska",       "NE" },
                                { "Nevada",         "NV" },
                                { "New Hampshire",  "NH" },
                                { "New Jersey",     "NJ" },
                                { "New Mexico",     "NM" },
                                { "New York",       "NY" },
                                { "North Carolina", "NC" },
                                { "North Dakota",   "ND" },
                                { "Ohio",           "OH" },
                                { "Oklahoma",       "OK" },
                                { "Oregon",         "OR" },
                                { "Pennsylvania",   "PA" },
                                { "Rhode Island",   "RI" },
                                { "South Carolina", "SC" },
                                { "South Dakota",   "SD" },
                                { "Tennessee",      "TN" },
                                { "Texas",          "TX" },
                                { "Utah",           "UT" },
                                { "Vermont",        "VT" },
                                { "Virginia",       "VA" },
                                { "Washington",     "WA" },
                                { "West Virginia",  "WV" },
                                { "Wisconsin",      "WI" },
                                { "Wyoming",        "WY" } };

const int nstates = sizeof state / sizeof *state;

现在您有了一个简单的双向查找。给定状态名或缩写,可以返回它在数组中所在的索引。此外,给定名称可以查找缩写,或者给定缩写,可以查找名称。
返回索引的简单查找函数可以是:
/* simple lookup function, given a code s, return index for state
 * in array of statedata_t on success, -1 otherwise.
 */
int lookupabrv (const char *s)
{
    int i = 0;

    for (; i < nstates; i++)
        if (state[i].abrv[0] == s[0] && state[i].abrv[1] == s[1])
            return i;

    return -1;
}

现在您可以使用全局查找表找到给定缩写的索引,您可以通过声明一个50main()的数组将其余的数据处理放在statedata_t中,例如。
int main (int argc, char **argv) {

    char buf[MAXC]; /* line buffer */
    /* array of 50 statedata_t (one for each state) */
    statedata_t stdata[sizeof state / sizeof *state] = {{.num_records = 0}};

现在您可以开始从文件中读取,并根据从文件中读取的缩写来insert_data正确的状态。一种简单的读取方法是将"TN"读入一个单独的数组,然后将气候数据读入一个类型为climate_t的临时stuct,您可以将其传递给insert_data函数。在insert_data函数中,只需查找索引(根据需要为data分配或重新分配),然后将临时数据结构分配给state.data的内存块。例如,您的insert_data函数可能如下所示:
/* insert data for state given code and climate_t containing data */
int insert_data (statedata_t *st, const char *code, climate_t *data)
{
    int index = lookupabrv (code);  /* lookup array index */

    if (index == -1)    /* handle error */
        return 0;

    if (!st[index].num_allocated) { /* allocate data if not allocated */
        st[index].data = malloc (NDATA * sizeof *st[index].data);
        if (!st[index].data) {
            perror ("malloc-st[index].data");
            return 0;
        }
        st[index].num_allocated = NDATA;
    }

    /* check if realloc needed */
    if (st[index].num_records == st[index].num_allocated) {
        /* realloc here, update num_allocated */
    }

    /* add data for proper state index */
    st[index].data[st[index].num_records++] = *data;

    return 1;   /* return success */
}

基本上就是这样。如何解析每一行的信息取决于您,但在我的示例中,考虑到您的示例数据,为了简单起见,我只使用了sscanf。总而言之,你可以做如下事情:
#include <stdio.h>
#include <stdlib.h>

/* if you need constants, either #define them or use an enum */
enum { ABRV = 2, NDATA = 8, LOC = 13, NAME = 15, MAXC = 1024 };

typedef struct {
    char name[NAME+1],
        abrv[ABRV+1];
} stateabrv_t;

typedef struct {            /* struct holding only climate data */
    long long millitime;
    char location[LOC];
    double humidity;
    double snow;
    double cloud;
    double lightning;
    long double pressure;
    double temperature;
} climate_t;

typedef struct {
    size_t  num_allocated,  /* track of how many data are allocated */
            num_records;
    climate_t *data;        /* a pointer to allocated block for data */
} statedata_t;

const stateabrv_t state[]  =  { { "Alabama",        "AL" },
                                { "Alaska",         "AK" },
                                { "Arizona",        "AZ" },
                                { "Arkansas",       "AR" },
                                { "California",     "CA" },
                                { "Colorado",       "CO" },
                                { "Connecticut",    "CT" },
                                { "Delaware",       "DE" },
                                { "Florida",        "FL" },
                                { "Georgia",        "GA" },
                                { "Hawaii",         "HI" },
                                { "Idaho",          "ID" },
                                { "Illinois",       "IL" },
                                { "Indiana",        "IN" },
                                { "Iowa",           "IA" },
                                { "Kansas",         "KS" },
                                { "Kentucky",       "KY" },
                                { "Louisiana",      "LA" },
                                { "Maine",          "ME" },
                                { "Maryland",       "MD" },
                                { "Massachusetts",  "MA" },
                                { "Michigan",       "MI" },
                                { "Minnesota",      "MN" },
                                { "Mississippi",    "MS" },
                                { "Missouri",       "MO" },
                                { "Montana",        "MT" },
                                { "Nebraska",       "NE" },
                                { "Nevada",         "NV" },
                                { "New Hampshire",  "NH" },
                                { "New Jersey",     "NJ" },
                                { "New Mexico",     "NM" },
                                { "New York",       "NY" },
                                { "North Carolina", "NC" },
                                { "North Dakota",   "ND" },
                                { "Ohio",           "OH" },
                                { "Oklahoma",       "OK" },
                                { "Oregon",         "OR" },
                                { "Pennsylvania",   "PA" },
                                { "Rhode Island",   "RI" },
                                { "South Carolina", "SC" },
                                { "South Dakota",   "SD" },
                                { "Tennessee",      "TN" },
                                { "Texas",          "TX" },
                                { "Utah",           "UT" },
                                { "Vermont",        "VT" },
                                { "Virginia",       "VA" },
                                { "Washington",     "WA" },
                                { "West Virginia",  "WV" },
                                { "Wisconsin",      "WI" },
                                { "Wyoming",        "WY" } };

const int nstates = sizeof state / sizeof *state;

/* simple lookup function, given a code s, return index for state
 * in array of statedata_t on success, -1 otherwise.
 */
int lookupabrv (const char *s)
{
    int i = 0;

    for (; i < nstates; i++)
        if (state[i].abrv[0] == s[0] && state[i].abrv[1] == s[1])
            return i;

    return -1;
}

/* insert data for state given code and climate_t containing data */
int insert_data (statedata_t *st, const char *code, climate_t *data)
{
    int index = lookupabrv (code);  /* lookup array index */

    if (index == -1)    /* handle error */
        return 0;

    if (!st[index].num_allocated) { /* allocate data if not allocated */
        st[index].data = malloc (NDATA * sizeof *st[index].data);
        if (!st[index].data) {
            perror ("malloc-st[index].data");
            return 0;
        }
        st[index].num_allocated = NDATA;
    }

    /* check if realloc needed */
    if (st[index].num_records == st[index].num_allocated) {
        /* realloc here, update num_allocated */
    }

    /* add data for proper state index */
    st[index].data[st[index].num_records++] = *data;

    return 1;   /* return success */
}

/* print states with data collected */
void print_data (statedata_t *st)
{
    int i = 0;

    for (; i < nstates; i++) {
        if (st[i].num_records) {
            size_t j = 0;
            printf ("\n%s\n", state[i].name);
            for (; j < st[i].num_records; j++)
                printf ("  %13lld  %-12s %5.1f %5.1f %5.1f %5.1f %8.1Lf "
                        "%8.4f\n",
                        st[i].data[j].millitime, st[i].data[j].location,
                        st[i].data[j].humidity, st[i].data[j].snow,
                        st[i].data[j].cloud, st[i].data[j].lightning,
                        st[i].data[j].pressure, st[i].data[j].temperature);
        }
    }
}

/* free allocated memory */
void free_data (statedata_t *st)
{
    int i = 0;

    for (; i < nstates; i++)
        if (st[i].num_records)
            free (st[i].data);
}

int main (int argc, char **argv) {

    char buf[MAXC]; /* line buffer */
    /* array of 50 statedata_t (one for each state) */
    statedata_t stdata[sizeof state / sizeof *state] = {{.num_records = 0}};
    /* read from file given as argument (or stdin if none given) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }

    while (fgets (buf, MAXC, fp)) {     /* read each line of data */
        char code[ABRV+1] = "";         /* declare storage for abriviation */
        climate_t tmp = { .millitime = 0 }; /* declare temp stuct for data */

        /* simple parse of data with sscanf */
        if (sscanf (buf, "%2s %lld %12s %lf %lf %lf %lf %Lf %lf", code,
            &tmp.millitime, tmp.location, &tmp.humidity, &tmp.snow,
            &tmp.cloud, &tmp.lightning, &tmp.pressure, &tmp.temperature)
            == 9) {
            if (!insert_data (stdata, code, &tmp))  /* insert data/validate */
                fprintf (stderr, "error: insert_data failed (%s).\n", code);
        }
        else    /* handle error */
            fprintf (stderr, "error: invalid format:\n%s\n", buf);
    }
    if (fp != stdin) fclose (fp);   /* close file if not stdin */

    print_data (stdata);    /* print data */
    free_data (stdata);     /* free allocated memory */

    return 0;
}

示例输入文件
$ cat dat/state_climate.txt
 TN     1424325600000   dn20t1kz0xrz    67.0    0.0  0.0     0.0    101872.0    262.5665
 TN     1422770400000   dn2dcstxsf5b    23.0    0.0  100.0   0.0    100576.0    277.8087
 TN     1422792000000   dn2sdp6pbb5b    96.0    0.0  100.0   0.0    100117.0    278.49207
 TN     1422748800000   dn2fjteh8e80    6.0     0.0  100.0   0.0    100661.0    278.28485
 TN     1423396800000   dn2k0y7ffcup    14.0    0.0  100.0   0.0    100176.0    282.02142

示例使用/输出
$ ./bin/state_climate <dat/state_climate.txt

Tennessee
  1424325600000  dn20t1kz0xrz  67.0   0.0   0.0   0.0 101872.0 262.5665
  1422770400000  dn2dcstxsf5b  23.0   0.0 100.0   0.0 100576.0 277.8087
  1422792000000  dn2sdp6pbb5b  96.0   0.0 100.0   0.0 100117.0 278.4921
  1422748800000  dn2fjteh8e80   6.0   0.0 100.0   0.0 100661.0 278.2849
  1423396800000  dn2k0y7ffcup  14.0   0.0 100.0   0.0 100176.0 282.0214

内存使用/错误检查
在动态分配内存的任何代码中,对于任何分配的内存块,您都有两个职责:(1)始终保留指向内存块起始地址的指针,以便(2)在不再需要时可以释放它。
必须使用内存错误检查程序,以确保您不会尝试访问内存或写入超出或超出已分配块的界限,尝试读取未初始化值或将条件跳转基于未初始化值,最后确认您释放了已分配的所有内存。
对于Linuxvalgrind是正常的选择。每个平台都有类似的内存检查程序。它们都很容易使用,只要运行你的程序就可以了。
$ valgrind ./bin/state_climate <dat/state_climate.txt
==6157== Memcheck, a memory error detector
==6157== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==6157== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==6157== Command: ./bin/state_climate
==6157==

Tennessee
  1424325600000  dn20t1kz0xrz  67.0   0.0   0.0   0.0 101872.0 262.5665
  1422770400000  dn2dcstxsf5b  23.0   0.0 100.0   0.0 100576.0 277.8087
  1422792000000  dn2sdp6pbb5b  96.0   0.0 100.0   0.0 100117.0 278.4921
  1422748800000  dn2fjteh8e80   6.0   0.0 100.0   0.0 100661.0 278.2849
  1423396800000  dn2k0y7ffcup  14.0   0.0 100.0   0.0 100176.0 282.0214
==6157==
==6157== HEAP SUMMARY:
==6157==     in use at exit: 0 bytes in 0 blocks
==6157==   total heap usage: 1 allocs, 1 frees, 768 bytes allocated
==6157==
==6157== All heap blocks were freed -- no leaks are possible
==6157==
==6157== For counts of detected and suppressed errors, rerun with: -v
==6157== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

始终确认已释放所有已分配的内存,并且没有内存错误。
仔细看一下,考虑一下为什么结构中的更改是有意义的。如果你有任何问题,请告诉我。

关于c - 标记外部文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53454699/

相关文章:

Swift 结构类型递归

C++ 动态(多阶段)3D vector 实例化和元素分配

c - 在 *nix 系统上的 C 中,使用 $PATH 或 "which"找到二进制文件的绝对路径?

c - 系统调用 : maximum arguments in

php - PHP explode 函数的 C 替代方案

c++ - 无法删除类指针

无法从二维动态数组中释放内存

c - GTK 3.0 给出编译器错误 "Only <gdk/gdkh> can be included directly"

c - 在C中为结构成员指针分配内存

c# - 在 foreach 循环中更改另一个结构内的结构