linux - Awk 获取包含逗号和换行符的 .csv 列

标签 linux bash awk

<分区>

我的 .csv 列中的数据有时包含逗号和换行符。如果我的数据中有逗号,我会用双引号将整个字符串括起来。在考虑换行符和逗号的情况下,我将如何将该列的输出解析为 .txt 文件。

不适用于我的命令的示例数据:

,"This is some text with a , in it.", #data with commas are enclosed in double quotes

,line 1 of data
line 2 of data, #data with a couple of newlines

,"Data that may a have , in it and
also be on a newline as well.",

这是我目前所拥有的:

awk -F "\"*,\"*" '{print $4}' file.csv > column_output.txt

最佳答案

$ cat decsv.awk
BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")"; OFS="," }
{
    # create strings that cannot exist in the input to map escaped quotes to
    gsub(/a/,"aA")
    gsub(/\\"/,"aB")
    gsub(/""/,"aC")

    # prepend previous incomplete record segment if any
    $0 = prev $0
    numq = gsub(/"/,"&")
    if ( numq % 2 ) {
        # this is inside double quotes so incomplete record
        prev = $0 RT
        next
    }
    prev = ""

    for (i=1;i<=NF;i++) {
        # map the replacement strings back to their original values
        gsub(/aC/,"\"\"",$i)
        gsub(/aB/,"\\\"",$i)
        gsub(/aA/,"a",$i)
    }

    printf "Record %d:\n", ++recNr
    for (i=0;i<=NF;i++) {
        printf "\t$%d=<%s>\n", i, $i
    }
    print "#######"

.

$ awk -f decsv.awk file
Record 1:
        $0=<,"This is some text with a , in it.", #data with commas are enclosed in double quotes>
        $1=<>
        $2=<"This is some text with a , in it.">
        $3=< #data with commas are enclosed in double quotes>
#######
Record 2:
        $0=<,"line 1 of data
line 2 of data", #data with a couple of newlines>
        $1=<>
        $2=<"line 1 of data
line 2 of data">
        $3=< #data with a couple of newlines>
#######
Record 3:
        $0=<,"Data that may a have , in it and
also be on a newline as well.",>
        $1=<>
        $2=<"Data that may a have , in it and
also be on a newline as well.">
        $3=<>
#######
Record 4:
        $0=<,"Data that \"may\" a have ""quote"" in it and
also be on a newline as well.",>
        $1=<>
        $2=<"Data that \"may\" a have ""quote"" in it and
also be on a newline as well.">
        $3=<>
#######

上面使用 GNU awk 进行 FPAT 和 RT。我不知道有任何 CSV 格式可以让您在未用引号引起来的字段中间有一个换行符(如果有,您将永远不知道任何记录在哪里结束),因此脚本不允许那。以上是在此输入文件上运行的:

$ cat file
,"This is some text with a , in it.", #data with commas are enclosed in double quotes
,"line 1 of data
line 2 of data", #data with a couple of newlines
,"Data that may a have , in it and
also be on a newline as well.",
,"Data that \"may\" a have ""quote"" in it and
also be on a newline as well.",

关于linux - Awk 获取包含逗号和换行符的 .csv 列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38731863/

相关文章:

php - 可以在网络主机上的 php 中使用命令行吗?

linux - Linux 下的 iPod 开发

linux - 如何将文件列表提供给 bash 脚本

linux - awk脚本中的命令行输入?

linux - 如何快速汇总文件中的所有数字?

regex - 使用awk和regex搜索两行之间的文本

linux - 是什么阻止了我的 cron 作业运行

bash - ./script 和 sh 脚本之间的区别

bash - 如何使用标志的可选参数创建 bash 脚本

linux - 我可以授予 Linux 用户与其他用户完全相同的权限吗?