c - 如何将一个文件拆分为两个文件以获取 400k 记录

标签 c file split data-cleaning

我试图根据标签将输入数据文件分成两个输出文件。下面是我的代码。下面的代码仅适用于较少数量的记录,但如果记录数量过多,则会出现段错误。行。

#include<stdio.h>
#include<stdlib.h>
#include<string.h>

int main(int argc,char *argv[])
{
        FILE *fp,*fp1,*fp2,*fp3;
        char *filename,line[80],line1[80];
        char *token,*token1,mystr[10];

        filename=(char *)argv[1];
        fp=fopen(filename,"r");

        if(fp ==NULL) //Checking whether the command line argument was correctly or not.
        printf("There is no such file in the directory.\n");

        if(remove("sales_ok_fraud.txt") != 0) //Checking for file existence.
        perror("Error in deleting the file.\n");
        else
        printf("The existing cleansed data file is successfully deleted.\n");

        if(remove("sales_unknwn.txt") != 0) //Checking for file existence.
        perror("Error in deleting the file.\n");
        else
        printf("The existing cleansed data file is successfully deleted.\n");

        while(fgets(line,80,fp)!=NULL) //Reading each line from file to calculate the file size.
        {
                strcpy(line1,line);
                token = strtok(line,",");
                token = strtok(NULL,",");
                token = strtok(NULL,",");
                token = strtok(NULL,",");
                token = strtok(NULL,",");
                token = strtok(NULL,",");
                token1 = strtok(token,"\n");
                memcpy(mystr,&token1[0],strlen(token1)-1);
                mystr[strlen(token1)-1] = '\0';


                if( strcmp(mystr,"ok") == 0 )
                {
                        fp1=fopen("sales_ok_fraud.txt","a");//Opening the file in append mode.
    fprintf(fp1,"%s",line1);//Writing into the file.
                        fclose(fp2);//Closing the file.

                        //printf("Inside ok - %s\n",mystr);
                }
                else if( strcmp(mystr,"fraud") == 0)
                {
                        fp2=fopen("sales_ok_fraud.txt","a");//Opening the file in append mode.
                        fprintf(fp2,"%s",line1);//Writing into the file.
                        fclose(fp2);//Closing the file.
                        //printf("Inside fraud - %s\n",mystr);
                }
                else
                {
                        fp3=fopen("sales_unknwn.txt","a");//Opening the file in append mode.
                        fprintf(fp3,"%s",line1);//Writing into the file.
                        fclose(fp3);//Closing the file.
                        //printf("This is unknown record.\n");
                }
        }

        fclose(fp);

        return 0;
}

最佳答案

我在您的代码中发现了一些问题,首先strlen返回字符串的长度包括空字节,因此您不需要-1 (这就是为什么它可能不匹配任何 strcmp 的原因)

memcpy(mystr, &token1[0], strlen(token1));
mystr[strlen(token1)] = '\0';

在这里我认为你应该关闭fp1

fp1=fopen("sales_ok_fraud.txt","a");  //you open f1
fprintf(fp1,"%s",line1);              //you write
fclose(fp2);//Closing the file.       //you close fp2

注意:您应该确保 token1 不会溢出 mystr

关于c - 如何将一个文件拆分为两个文件以获取 400k 记录,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13421438/

相关文章:

c - mmap 总线错误写入超过 2Gb 的 MAP_SHARED 文件

C++使用分隔符逐行读取文件并将数据存储到列表中

arrays - 通过字符串分隔符将字符串数组分隔成字符串数组

c - 使用 "fread"获得意外输出

c - 从 C 中的文件中读取, "%*lf"是如何工作的?

python-3.x - 如何将.txt文件导入python

c - 我从位图文件中获取的大小是否正确?不匹配的属性

javascript - typescript 将字符串解析为数字组 345-67 以及包含单词和数字的文本

javascript - 正则表达式:匹配所有字符串但捕获两个符号之间的组

c - 在 C 中用零设置环境变量