我有一个文件,我已将其精简为如下所示:
"Reno","40.00"
"Reno","40.00"
"Reno","80.00"
"Reno","60.00"
"Lakewood","150.00"
"Altamonte Springs","50.25"
"Altamonte Springs","25.00"
"Altamonte Springs","25.00"
"Sandpoint","50.00"
"Lenoir City","987.00"
等等
我最终想要得到的是每个城市的总金额。即:
"Reno","220.00"
"Lakewood","150.00"
"Altamonte Springs","100.25"
等等
公平警告,数据集不一定是连续的——也就是说,一个城市可能在这里出现一次,向下一千行出现一次,最后还有 3 次。
我一直在尝试使用以下 awk 脚本:
awk -F "," '{array[$1]+=$2} END { for (i in array) {print i"," array[i]}}' test1.csv > test6.csv
我得到的结果如下所示:
"Matawan",0
"Bay Side",0
"Pataskala",0
"Dorothy",0
"Haymarket",0
"Myrtle Point",0
等等。第二列全为零,没有引号。
我显然遗漏了一些东西,但我不知道要看什么或在哪里看。我错过了什么?
谢谢。
最佳答案
你失败的原因是双引号。
做这样的事情:
sed 's/"//g' file.csv | awk -F "," '{array[$1]+=$2}END{for(i in array) {print "\"" i "\"" "," "\"" array[i] "\"" }}'
"Lenoir City","987"
"Reno","220"
"Lakewood","150"
"Sandpoint","50"
"Altamonte Springs","100.25"
关于bash - 当文件中的字段匹配时,对 csv 中的多行求和,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19166417/