我有两个文件并尝试根据列比较文件
文件_1
CALL_3 CALL_1
CALL_2 CALL_5
CALL_3 CALL_2
CALL_1 CALL_4
文件_2
CALL_1 GAP:A GAP:G
CALL_3 GAP:C GAP:Q GAP:R
CALL_5 GAP:R GAP:A
CALL_4 GAP:C GAP:D GAP:A GAP:W
CALL_2 GAP:C GAP:R GAP:A
我只想打印 file_1 中至少有一个 GAP_id 的交互在这两者之间是通用的。
预期输出
CALL_2 CALL_5 GAP:A GAP:R
CALL_3 CALL_2 GAP:C GAP:R
CALL_1 CALL_4 GAP:A
我尝试了以下方法:
awk 'NR==FNR {
a[$1]=($1 OFS $2 OFS $3 OFS $4 OFS $5 OFS $6 OFS $7 OFS $8 OFS $9)
next
}
($1 in a)&&($2 in a) {
print a[$1],a[$2]
}' File_2 File_1
它适用于固定数量的列。但是 file_2 中的列数不固定(超过 1000 列)。如何得到预期的输出?
最佳答案
能否请您尝试以下。
awk '
FNR==NR{
val=$1
$1=""
$0=$0
$1=$1
a[val]=$0
next
}
{
val=""
num1=split(a[$1],array1," ")
for(i=1;i<=num1;i++){
array3[array1[i]]
}
num2=split(a[$2],array2," ")
for(i=1;i<=num2;i++){
array4[array2[i]]
}
for(k in array3){
if(k in array4){
val=(val?val OFS:"")k
}
}
if(val){
print $0,val
}
val=""
delete array1
delete array2
delete array3
delete array4
}
' Input_file2 Input_file1
输出如下。
CALL_2 CALL_5 GAP:A GAP:R
CALL_3 CALL_2 GAP:C GAP:R
CALL_1 CALL_4 GAP:A
说明:为上述代码添加详细说明。
awk ' ##Starting awk program here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE for first Input_file is being read.
val=$1 ##Creating a variable named val whose value is $1 of current line.
$1="" ##Nullifying $1 here.
$0=$0 ##Re-assigning value of current line to itself, so that initial space will be removed.
$1=$1 ##Re-assigning value of current line to itself, so that initial space will be removed.
a[val]=$0 ##Creating an array named a whose index is val and value is $0.
next ##next will skip all further statements from here.
}
{
val="" ##Nullifying variable val here.
num1=split(a[$1],array1," ") ##splitting array a with index $1 to array1 and having its total number in num1.
for(i=1;i<=num1;i++){ ##Starting a for loop from i=1 till value of num1
array3[array1[i]] ##Creating an array named array3 with index of array1 with index i.
}
num2=split(a[$2],array2," ") ##splitting array a with index $2 to array2 and having its total number in num2.
for(i=1;i<=num2;i++){ ##Starting a for loop from i=1 till value of num2.
array4[array2[i]] ##Creating an array named array4 with value of array2 with index i.
}
for(k in array3){ ##Traversing through array3 here.
if(k in array4){ ##Checking condition if k which is index of array3 is present in array4 then do following.
val=(val?val OFS:"")k ##Creating variable named val whose value is variable k with concatenating its own value each time to it.
}
}
if(val){ ##Checking condition if variable val is NOT NULL then do following.
print $0,val ##Printing current line and variable val here.
}
val="" ##Nullifying variable val here.
delete array1 ##Deleting array1 here.
delete array2 ##Deleting array2 here.
delete array3 ##Deleting array3 here.
delete array4 ##Deleting array4 here.
}
' Input_file2 Input_file1 ##Mentioning Input_file names here.
关于awk - 比较两个文件的列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58857778/