我有一个来自公司报告制作的管道分隔的 CSV。但是有一个'comment'字段是员工随机输入的数据导致数据加载失败。我如何使用 UNIX 命令或 shell 脚本解决它?
数据样本如下所示:
Employee ID|Time Type|Start Date|End Date|Number Of Days|Comment|Approved
90006731|Leave|04/02/2019|04/02/2019|1|annual leaves|Y
90005267|Leave|04/02/2019|04/02/2019|1||Y
90007366|Leave|04/02/2019|04/02/2019|1|* Take care of vehicle taxes
* Vehicle Repair
* Community service
* Swimming|Y
90005052|Leave|04/02/2019|04/02/2019|1|Son's field trip|Y
90006253|Death of Wife/Husband/Child/Parent|04/01/2019|04/02/2019|2||Y
90007595|Leave|04/01/2019|04/01/2019|1|family plan|Y
90004064|Leave|08/18/2020|08/21/2020|3|Dear Mas Rama,
Please kindly approve, Mas Okto was oke.
Thanks.|Y
90007072|Sick Leave Without Certificate|04/01/2019|04/01/2019|1|Sick leave due to eye swelling|Y
90004371|Sick Leave|04/01/2019|04/05/2019|4||Y
90007431|Sick Leave|04/01/2019|04/01/2019|1||Y
所需输出:Employee ID|Time Type|Start Date|End Date|Number Of Days|Comment|Approved
90006731|Leave|04/02/2019|04/02/2019|1|annual leaves|Y
90005267|Leave|04/02/2019|04/02/2019|1||Y
90007366|Leave|04/02/2019|04/02/2019|1|* Take care of vehicle taxes * Vehicle Repair * Community service * Swimming|Y
90005052|Leave|04/02/2019|04/02/2019|1|Son's field trip|Y
90006253|Death of Wife/Husband/Child/Parent|04/01/2019|04/02/2019|2||Y
90007595|Leave|04/01/2019|04/01/2019|1|family plan|Y
90004064|Leave|08/18/2020|08/21/2020|3|Dear Mas Rama, Please kindly approve, Mas Okto was oke. Thanks.|Y
90007072|Sick Leave Without Certificate|04/01/2019|04/01/2019|1|Sick leave due to eye swelling|Y
90004371|Sick Leave|04/01/2019|04/05/2019|4||Y
90007431|Sick Leave|04/01/2019|04/01/2019|1||Y
我试过[这个][1],awk -F\| '{ while (NF < 7 || $NF == "") { brokenline=$0; getline; $0 = brokenline $0}; print }' cu_inf_20200902tst.csv > cu_inf_20200902tst1.csv
但我发现了错误警告:awk: cmd. line:1: (FILENAME=cu_inf_20200902tst.csv FNR=19) fatal: grow_fields_arr: fields_arr: can't allocate 321069040 bytes of memory (Cannot allocate memory)
有什么建议可以解决我的问题吗?[1]:https://unix.stackexchange.com/questions/434979/fixing-malformed-csv-with-incorrect-new-line-chars-using-sed-or-perl-only
最佳答案
考虑到您的第一个字段 ID 将始终为 8 位数字,并且没有其他字段将是 8 位数字,如果是这种情况,那么您可以尝试以下操作。
awk '
{
printf("%s%s", (FNR>1 ? (/^[0-9]{8}/?ORS:OFS) : ""), $0)
}
END{
print ""
}' Input_file
说明:为上述添加详细说明。awk ' ##Starting awk program from here.
{
printf("%s%s",(FNR>1?(/^[0-9]{8}/?ORS:OFS) ""), $0) ##Using printf statement from here where checking condition if line is first line then check if line starts from 8 digits then print new line else print space.
}
END{ ##Starting END block of this program from here.
print "" ##Printing a new line in here.
}' Input_file ##Mentioning Input_file name here.
或(如果您想将标题本身分开,则将其条件分开)awk '
FNR==1{
print
next
}
{
printf("%s%s",$0!~/^[0-9]{8}/?OFS:(FNR>2?ORS:""),$0)
}
END{
print ""
}' Input_file
说明:为上述添加详细说明。awk ' ##Starting awk program from here.
FNR==1{ ##Checking condition if this is first line then do following.
print ##Printing current line here.
next ##next will skip all further statements from here.
}
{
printf("%s%s",$0!~/^[0-9]{8}/?OFS:(FNR>2?ORS:""),$0) ##Using printf for printing where checking condition if line starts from 8 digits then print space else check if line number is more than 2 then print new line or nothing with current line.
}
END{ ##Starting END block of this program from here.
print "" ##Printing a new line in here.
}' Input_file ##Mentioning Input_file name here.
关于shell - 使用 UNIX 命令替换 csv 中的字符或修复不正确的换行符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63716758/