我有如下数据 如何打印两个标签之间的数据 我希望数据是命令分隔的 csv 格式
我的方法是将数据转换为水平格式,然后在每第 4 列后进行剪切并转换为垂直格式
xml文件中的数据
<?xml version="1.0" encoding="UTF-8" standalone="true"?>
-
<sst uniqueCount="12" count="12"
xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
-
<si>
<t>"NAME"</t>
</si>
-
<si>
<t>"Vikas"</t>
</si>
-
<si>
<t>"Vijay"</t>
</si>
-
<si>
<t>"Vilas"</t>
</si>
-
<si>
<t>"AGE"</t>
</si>
-
<si>
<t>"24"</t>
</si>
-
<si>
<t>"34"</t>
</si>
-
<si>
<t>"35"</t>
</si>
-
<si>
<t>"COURSE"</t>
</si>
-
<si>
<t>"MCA"</t>
</si>
-
<si>
<t>"MMS"</t>
</si>
-
<si>
<t>"MBA"</t>
</si>
</sst>
我尝试过以下命令不起作用..
awk '/<t/{flag=1;next}/<t/{flag=0}flag' abc.xml
即使尝试了下面的命令,它也会提供数据,但在单行中
awk -F'(</*t>|</*t>)' 'NF>1{for(i=2;i<NF; i=i+2) printf("%s%s", $i, (i+1==NF)?ORS:OFS)}' OFS=',' demo.xml
我想要以下数据作为输出
NAME,AGE,Course
Vikas,"25",MCA
Prabhash,"34",MBA
Arjun,"21",MMS
最佳答案
仅使用您显示的示例,您可以尝试以下操作。
awk -v OFS="," '
!NF || /^-$/{ next }
/<t>"COURSE"<\/t>/{
foundAge=foundName=""
foundCourse=1
count=0
}
/<t>"AGE"<\/t>/{
foundAge=1
foundName=""
count=0
}
/<t>"NAME"<\/t>/{
foundName=1
count=0
}
foundAge && match($0,/>[^<]*/){
age[++count]=substr($0,RSTART+1,RLENGTH-1)
}
foundName && match($0,/>[^<]*/){
name[++count]=substr($0,RSTART+1,RLENGTH-1)
}
foundCourse && match($0,/>[^<]*/){
course[++count]=substr($0,RSTART+1,RLENGTH-1)
}
END{
for(k=1;k<=count;k++){
if(name[k]){
print name[k],age[k],course[k]
}
}
}
' Input_file
说明:为上述内容添加详细说明。
awk -v OFS="," ' ##Starting awk program from here.
!NF || /^-$/{ next } ##if line is empty or starts with - then skip that line.
/<t>"COURSE"<\/t>/{ ##Checking if line has <t>"COURSE"</t> then do following.
foundAge=foundName="" ##Nullifying foundAge and foundName here.
foundCourse=1 ##Setting foundCourse to 1 here.
count=0 ##Setting count to 0 here.
}
/<t>"AGE"<\/t>/{ ##Checking if line has <t>"AGE"</t> then do following.
foundAge=1 ##Setting foundAge to 1 here.
foundName=foundCourse="" ##Nullifying foundName and foundCourse here.
count=0 ##Setting count to 0 here.
}
/<t>"NAME"<\/t>/{ ##Checking if line has <t>"NAME"</t> then do following.
foundName=1 ##Setting foundName to 1 here.
count=0 ##Setting count to 0 here.
}
foundAge && match($0,/>[^<]*/){ ##Checking if foundAge is set and using match function to get values from > to till < here.
age[++count]=substr($0,RSTART+1,RLENGTH-1) ##Creating age with index of count and having matched regex value here.
}
foundName && match($0,/>[^<]*/){ ##Checking if foundName is set and using match function to get values from > to till < here.
name[++count]=substr($0,RSTART+1,RLENGTH-1) ##Creating name with index of count and having matched regex value here.
}
foundCourse && match($0,/>[^<]*/){ ##Checking if foundCourse is set and using match function to get values from > to till < here.
course[++count]=substr($0,RSTART+1,RLENGTH-1) ##Creating course with index of count and having matched regex value here.
}
END{ ##Starting END block of this awk program from here.
for(k=1;k<=count;k++){ ##Traversing through all elements of name here.
if(name[k]){
print name[k],age[k],course[k] ##Printing respective array values here.
}
}
}
' Input_file ##Mentioning Input_file name here.
编辑:根据OP的评论,如果一行中需要所有值,请尝试以下操作:
awk -v OFS="," '
!NF || /^-$/{ next }
/<t>"COURSE"<\/t>/{
foundAge=foundName=""
foundCourse=1
count=0
}
/<t>"AGE"<\/t>/{
foundAge=1
foundName=""
count=0
}
/<t>"NAME"<\/t>/{
foundName=1
count=0
}
foundAge && match($0,/>[^<]*/){
age[++count]=substr($0,RSTART+1,RLENGTH-1)
}
foundName && match($0,/>[^<]*/){
name[++count]=substr($0,RSTART+1,RLENGTH-1)
}
foundCourse && match($0,/>[^<]*/){
course[++count]=substr($0,RSTART+1,RLENGTH-1)
}
END{
for(k=1;k<=count;k++){
if(name[k]){
nameVal=(nameVal?nameVal OFS:"")name[k]
ageVal=(ageVal?ageVal OFS:"")age[k]
courseVal=(courseVal?courseVal OFS:"")course[k]
}
}
print nameVal,ageVal,courseVal
}
' Input_file
关于shell - 提取标签 <t> </t> 之间的数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67245123/