shell - 提取标签 <t> </t> 之间的数据

我有如下数据如何打印两个标签之间的数据我希望数据是命令分隔的 csv 格式

我的方法是将数据转换为水平格式，然后在每第 4 列后进行剪切并转换为垂直格式

xml文件中的数据

<?xml version="1.0" encoding="UTF-8" standalone="true"?>
-
<sst uniqueCount="12" count="12"
    xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
-
    <si>
        <t>"NAME"</t>
    </si>
-
    <si>
        <t>"Vikas"</t>
    </si>
-
    <si>
        <t>"Vijay"</t>
    </si>
-
    <si>
        <t>"Vilas"</t>
    </si>
-
    <si>
        <t>"AGE"</t>
    </si>
-
    <si>
        <t>"24"</t>
    </si>
-
    <si>
        <t>"34"</t>
    </si>
-
    <si>
        <t>"35"</t>
    </si>
-
    <si>
        <t>"COURSE"</t>
    </si>
-
    <si>
        <t>"MCA"</t>
    </si>
-
    <si>
        <t>"MMS"</t>
    </si>
-
    <si>
        <t>"MBA"</t>
    </si>
</sst>

我尝试过以下命令不起作用..

awk '/<t/{flag=1;next}/<t/{flag=0}flag' abc.xml

即使尝试了下面的命令，它也会提供数据，但在单行中

awk -F'(</*t>|</*t>)' 'NF>1{for(i=2;i<NF; i=i+2) printf("%s%s", $i, (i+1==NF)?ORS:OFS)}' OFS=',' demo.xml

我想要以下数据作为输出

NAME,AGE,Course
Vikas,"25",MCA
Prabhash,"34",MBA
Arjun,"21",MMS

最佳答案

仅使用您显示的示例，您可以尝试以下操作。

awk -v OFS="," '
!NF || /^-$/{ next }
/<t>"COURSE"<\/t>/{
  foundAge=foundName=""
  foundCourse=1
  count=0
}
/<t>"AGE"<\/t>/{
  foundAge=1
  foundName=""
  count=0
}
/<t>"NAME"<\/t>/{
  foundName=1
  count=0
}
foundAge && match($0,/>[^<]*/){
  age[++count]=substr($0,RSTART+1,RLENGTH-1)
}
foundName && match($0,/>[^<]*/){
  name[++count]=substr($0,RSTART+1,RLENGTH-1)
}
foundCourse && match($0,/>[^<]*/){
  course[++count]=substr($0,RSTART+1,RLENGTH-1)
}
END{
  for(k=1;k<=count;k++){
    if(name[k]){
      print name[k],age[k],course[k]
    }
  }
}
'  Input_file

说明:为上述内容添加详细说明。

awk -v OFS="," '                                 ##Starting awk program from here.
!NF || /^-$/{ next }                             ##if line is empty or starts with - then skip that line.
/<t>"COURSE"<\/t>/{                              ##Checking if line has <t>"COURSE"</t> then do following.
  foundAge=foundName=""                          ##Nullifying foundAge and foundName here.
  foundCourse=1                                  ##Setting foundCourse to 1 here.
  count=0                                        ##Setting count to 0 here.
}
/<t>"AGE"<\/t>/{                                 ##Checking if line has <t>"AGE"</t> then do following.
  foundAge=1                                     ##Setting foundAge to 1 here.
  foundName=foundCourse=""                       ##Nullifying foundName and foundCourse here.
  count=0                                        ##Setting count to 0 here.
}
/<t>"NAME"<\/t>/{                                ##Checking if line has <t>"NAME"</t> then do following.
  foundName=1                                    ##Setting foundName to 1 here.
  count=0                                        ##Setting count to 0 here.
}
foundAge && match($0,/>[^<]*/){                  ##Checking if foundAge is set and using match function to get values from > to till < here.
  age[++count]=substr($0,RSTART+1,RLENGTH-1)     ##Creating age with index of count and having matched regex value here.
}
foundName && match($0,/>[^<]*/){                 ##Checking if foundName is set and using match function to get values from > to till < here.
  name[++count]=substr($0,RSTART+1,RLENGTH-1)    ##Creating name with index of count and having matched regex value here.
}
foundCourse && match($0,/>[^<]*/){               ##Checking if foundCourse is set and using match function to get values from > to till < here.
  course[++count]=substr($0,RSTART+1,RLENGTH-1)  ##Creating course with index of count and having matched regex value here.
}
END{                                             ##Starting END block of this awk program from here.
  for(k=1;k<=count;k++){                         ##Traversing through all elements of name here.
    if(name[k]){
      print name[k],age[k],course[k]             ##Printing respective array values here.
    }
  }
}
'  Input_file                                    ##Mentioning Input_file name here.

编辑:根据OP的评论，如果一行中需要所有值，请尝试以下操作:

awk -v OFS="," '
!NF || /^-$/{ next }
/<t>"COURSE"<\/t>/{
  foundAge=foundName=""
  foundCourse=1
  count=0
}
/<t>"AGE"<\/t>/{
  foundAge=1
  foundName=""
  count=0
}
/<t>"NAME"<\/t>/{
  foundName=1
  count=0
}
foundAge && match($0,/>[^<]*/){
  age[++count]=substr($0,RSTART+1,RLENGTH-1)
}
foundName && match($0,/>[^<]*/){
  name[++count]=substr($0,RSTART+1,RLENGTH-1)
}
foundCourse && match($0,/>[^<]*/){
  course[++count]=substr($0,RSTART+1,RLENGTH-1)
}
END{
  for(k=1;k<=count;k++){
     if(name[k]){
     nameVal=(nameVal?nameVal OFS:"")name[k]
     ageVal=(ageVal?ageVal OFS:"")age[k]
     courseVal=(courseVal?courseVal OFS:"")course[k]
     }
  }
  print nameVal,ageVal,courseVal
}
'  Input_file

关于shell - 提取标签 <t> </t> 之间的数据，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/67245123/

shell - 提取标签 <t> </t> 之间的数据

上一篇：c# - Entity Framework IdentityUser覆盖用户名不会保存在数据库中

下一篇：php - Microsoft Graph API(日历)间歇性 503 错误