linux - 如何在字符串字段中使用多个逗号格式化 .CSV 文件的日期字段

标签 linux csv sed awk cut

我有一个 .CSV 文件 (file.csv),其数据全部用双引号括起来。文件格式示例如下:

column1,column2,column3,column4,column5,column6, column7, Column8, Column9, Column10
"12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in between","4432","author1, name","890","88","11-OCT-11","12"
"4432","B000QRIGJ4","890","another, string with quotes, and with more than, two commas: in between","455","author2, name","12","455","12-OCT-11","55"
"11","B000QRIGJ4","77","string with, commas and (paranthesis) and : colans, in between","12","author3, name","333","22","13-OCT-11","232"

第 9 个字段是格式为 "DD-MMM-YY" 的日期字段。我必须将其转换为 YYYY/MM/DD 格式。我正在尝试使用以下代码,但没有用。

awk -F, '
 BEGIN {
 split("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC", month, " ")
 for (i=1; i<=12; i++) mdigit[month[i]]=i
 }
 { m=substr($9,4,3)
 $9 = sprintf("%02d/%02d/"20"%02d",mdigit[m],substr($9,1,2),substr($9,8,20))
 print
 }' OFS="," file.csv > temp_file.csv

执行上述代码后,文件temp_file.csv的输出如下所示。

column1,column2,column3,column4,column5,column6,column7,Column8,00/00/2000,Column10
"12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in between","4432","author1,00/00/2000,"890","88","11-OCT-11","12"
"4432","B000QRIGJ4","890","another, string with quotes, and with more than, two commas: in between","455",00/00/2002, name","12","455","12-OCT-11","55"
"11","B000QRIGJ4","77","string with, commas and (paranthesis) and : colans, in between","12","author3,00/00/2000,"333","22","13-OCT-11","232"

据我所知,问题出在双引号中的逗号,因为我的代码也考虑到了它们......请就以下问题提出建议:

1) 双引号所有字段中的所有值有什么区别吗?如果它们有什么不同,我如何从除其中带逗号的字符串之外的所有值中删除它们? 2) 对我的代码进行任何修改,以便我可以将格式为 "DD-MMM-YYYY" 的第 9 个字段格式化为 YYYY/MM/DD

最佳答案

我强烈建议您使用正确的 CSV 解析器。例如使用 Text::CSV_XS在 Perl 中将以正确和理智的方式完成工作。例如这个单行:

perl -MText::CSV_XS -E'$csv=Text::CSV_XS->new({eol=>"\n", allow_whitespace=>1});@m=qw(JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC);@m{@m}=(1 .. @m);while(my $row=$csv->getline(ARGV)){($d,$m,$y)=split("-",$row->[8]);$row->[8]=sprintf"%02d/%02d/%04d",$d,$m{$m},$y if $m{$m};$csv->print(STDOUT, $row)}' file.csv > temp_file.csv

关于linux - 如何在字符串字段中使用多个逗号格式化 .CSV 文件的日期字段,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19394901/

相关文章:

linux - 在 sudo 命令中验证密码

linux - 创建一个使用 cat 并具有 3 个参数的简单 bash 脚本

python - 如何获取要打印的 CSV 列列表的总和

python - 在 Python 中,Pandas 错误地加载 CSV 文件(Python for Data Analysis 书籍示例)

linux - lldbinit 中的进程句柄

python - 读取多个 csv 文件并将文件名添加为 Pandas 中的新列

sed - 替换 sed 中的 HTML 结束标签

regex - 从文件名中提取信息以进行自动复制

linux - Sed 命令。在每行末尾添加文本,并从每行下面复制文本。之后删除添加的行

linux - 如何使用 Java 代码从远程计算机获取命令或 shell 脚本的响应