linux - 使用 shell 脚本从文件中提取唯一的行 block

标签 linux bash shell unix

从文件中提取行 block 时,我遇到了一些问题。考虑以下两个文件

File-1
1.20/abc/this_is_test_1
perl/RRP/RRP-1.30/JEDI/JEDIExportSuccess2
exec perl/RRP/RRP-1.30/JEDI/CommonReq/confAbvExp
perl/LRP/BaseLibs/close-MMM
exec perl/LRP/BaseLibs/launchLRPCHURRTA("TYRE")
this/or/that

File-2
exec 1.20/setup/testird
exec 1.20/sql/temp/Test3
exec 1.20/setup/testxyz
exec 1.20/sql/fondle_opr_sql_labels
exec 1.20/setup/testird
exec 1.20/sql/temp/NEWTest
exec 1.20/setup/testxyz
exec 1.20/sql/fondle_opr_sql_xfer
exec 1.20/setup/testird
exec 1.20/sql/set_sec_not_0
exec 1.20/setup/testpqr
exec 1.20/sql/sql_ba_statuses_on_mult
exec perl/RRP/SetupReq/testdef_ijk
exec perl/RRP/RRP-1.30/JEDI/SetupReq/confAbvExp
exec perl/RRP/RRP-1.30/JEDI/JEDIExportSuccess1
exec perl/RRP/SetupReq/testdef_ijk
exec perl/RRP/RRP-1.30/JEDI/SetupReq/confAbvExp
exec perl/RRP/RRP-1.30/JEDI/JEDIExportSuccess2
exec perl/RRP/SetupReq/testdef_ijk
exec perl/RRP/RRP-1.30/JEDI/SetupReq/confAbvExp
exec perl/RRP/RRP-1.30/JEDI/JEDIExportSuccess3
exec 1.20/setup/testird
exec 1.20/sql/sqlmenu_purr_labl
exec 1.20/sql/est_time_at_non_drp_plc
exec 1.20/sql/half_Brd_Supply_mix_single
exec 1.20/setup/testird
exec 1.20/sql/temp/Test
exec 1.20/setup/testird
exec 1.20/sql/temp/Test2
exec perl/LRP/SetupReq/testird_LRP("LRP")
exec perl/BaseLibs/launch_client("LRP")
exec perl/LRP/LRP-classic-4.14/churrip/chorSingle
exec perl/LRP/BaseLibs/setupLRPMMMTab
exec perl/LRP/BaseLibs/launchMMM
exec perl/LRP/BaseLibs/launchLRPCHURRTA("TYRE")
#PAUSE Expand Churrip tree view & open all nodes
exec perl/LRP/LRP-classic-4.14/Corrugator/multipleSeriesWeb
exec perl/BaseLibs/ShutApp("Self Destruction System")
exec perl/LRP/BaseLibs/close-MMM
exec 1.20/setup/testmiddle
exec 1.20/sql/collective_reads
exec 1.20/setup/testinit
exec 1.20/abc/this_is_test_1
exec 1.20/abc/this_is_test_1
exec perl/LRP/SetupReq/abcDEF
exec perl/BaseLibs/launch_client("sqlC","LRP")
exec perl/LRP/LRP-perl-4.20/fireTrigger

现在,对于 File-1 中的每一行,我想从 File-2 中提取相关的行 block 。 File-2中的一个 block 定义如下

exec 1.20/setup/xxxxx
blah blah blah
blah blah blah
.
.
.
all lines till next setup line is found

例如

exec 1.20/setup/testinit
exec 1.20/abc/this_is_test_1
exec 1.20/abc/this_is_test_1

exec perl/LRP/SetupReq/xxxxx
blah blah blah
blah blah blah
.
.
.
all lines till next setup line is found

例如

exec perl/LRP/SetupReq/testird_LRP("LRP")
exec perl/BaseLibs/launch_client("LRP")
exec perl/LRP/LRP-classic-4.14/churrip/chorSingle
exec perl/LRP/BaseLibs/setupLRPMMMTab
exec perl/LRP/BaseLibs/launchMMM
exec perl/LRP/BaseLibs/launchLRPCHURRTA("TYRE")
#PAUSE Expand Churrip tree view & open all nodes
exec perl/LRP/LRP-classic-4.14/Corrugator/multipleSeriesWeb
exec perl/BaseLibs/ShutApp("Self Destruction System")
exec perl/LRP/BaseLibs/close-MMM

到目前为止,我已经设法在以下脚本的帮助下从 File-2 中提取相关 block

Shell Script
#set -x
FLBATCHLIST=$1
BATCHFILE=$2

TEMPDIR="/usr/tmp/tempBatchDir"
rm -rf $TEMPDIR/*

WORKFILE="$TEMPDIR/failedTestList.txt"
CPBATCHFILE="$TEMPDIR/orig.test"
TESTSETFILE="$TEMPDIR/testset.txt"
TEMPFILE="$TEMPDIR/temp.txt"
DIFFFILE="$TEMPDIR/diff.txt"

#Output
FAILEDBATCH="$TEMPDIR/FailedBatch.test"
LOGFILE="$TEMPDIR/log.txt"

createBatch ()
{

TESTNAME=$1
#First process the $CPBATCHFILE to not have any blank lines, leading and trailing whitespaces
# delete BOTH leading and trailing whitespace from each line and blank lines from file
sed -i 's/^[[:space:]]*//;s/[[:space:]]*$//g;/^$/d' $CPBATCHFILE
FOUND=0
STATUS=1
while [ $STATUS -ne "0" ]
do
        if [ ! -s $CPBATCHFILE ]; then
                echo "$CPBATCHFILE is empty" >> $LOGFILE
                STATUS=0
        fi
        awk '/[Ss]etup.*[Tt]est/ || /perl\/[[:alpha:]]*\/[Ss]etup[rR]eq/{if(b) exit; else b=1}1' $CPBATCHFILE > $TESTSETFILE
        grep -i "$TESTNAME$" $TESTSETFILE >> $LOGFILE 2>&1
        if [ $? -eq "0" ]; then
                echo "test found" >> $LOGFILE
                cat $TESTSETFILE >> $FAILEDBATCH
                FOUND=1
        fi
        TSTFLLINES=`wc -l < $TESTSETFILE`
        CPBTCHLINES=`wc -l < $CPBATCHFILE`
        DIFF=`expr $CPBTCHLINES - $TSTFLLINES`
        tail -n $DIFF $CPBATCHFILE > $DIFFFILE
        mv $DIFFFILE $CPBATCHFILE
done

if [ $FOUND -eq 0 ]; then
        echo $TESTNAME > $TEMPDIR/test.txt
        ABSTEST=$(echo $TESTNAME | sed 's/\\//g')
        echo "FATAL ERROR: Test \"$ABSTEST\" not found in batch" | tee -a $LOGFILE
fi

}

####STARTS HERE####
mkdir -p $TEMPDIR
#cat  $TEMPDIR/test.txt
#FLBATCHLIST="$TEMPDIR/test.txt"
# delete run, BOTH leading and trailing whitespace and blank lines from file
sed 's/^[eE][xX][eE][cC]//g;s/^[[:space:]]*//;s/[[:space:]]*$//g;/^$/d' $FLBATCHLIST > $WORKFILE

# escaping special characters like '\' and '.' in the path names for better grepping
sed -i 's/\([\/\.\"]\)/\\\1/g' $WORKFILE

for fltest in $(cat $WORKFILE)
do
        echo $fltest >> $LOGFILE
        cp $BATCHFILE $CPBATCHFILE
        createBatch $fltest
done

sed -i 's/\//\\/g' $FAILEDBATCH
## Clean up
cp $FAILEDBATCH .

这个脚本的问题是

  1. 对于 File-1 的每一行遍历 File-2 需要一些时间。我想知道是否有更好的解决方案,我只需要遍历 File-2 一次。

  2. 该脚本确实解决了我的问题,但我留下的文件中有重复的行 block 。我想知道有没有办法删除重复的行 block 。

这是我执行脚本时的输出

exec 1.20\setup\testinit
exec 1.20\abc\this_is_test_1
exec 1.20\abc\this_is_test_1
exec perl\RRP\SetupReq\testdef_ijk
exec perl\RRP\RRP-1.30\JEDI\SetupReq\confAbvExp
exec perl\RRP\RRP-1.30\JEDI\JEDIExportSuccess2
exec perl\RRP\SetupReq\testdef_ijk
exec perl\RRP\RRP-1.30\JEDI\SetupReq\confAbvExp
exec perl\RRP\RRP-1.30\JEDI\JEDIExportSuccess1
exec perl\RRP\SetupReq\testdef_ijk
exec perl\RRP\RRP-1.30\JEDI\SetupReq\confAbvExp
exec perl\RRP\RRP-1.30\JEDI\JEDIExportSuccess2
exec perl\RRP\SetupReq\testdef_ijk
exec perl\RRP\RRP-1.30\JEDI\SetupReq\confAbvExp
exec perl\RRP\RRP-1.30\JEDI\JEDIExportSuccess3
exec perl\LRP\SetupReq\testird_LRP("LRP")
exec perl\BaseLibs\launch_client("LRP")
exec perl\LRP\LRP-classic-4.14\churrip\chorSingle
exec perl\LRP\BaseLibs\setupLRPMMMTab
exec perl\LRP\BaseLibs\launchMMM
exec perl\LRP\BaseLibs\launchLRPCHURRTA("TYRE")
#PAUSE Expand Churrip tree view & open all nodes
exec perl\LRP\LRP-classic-4.14\Corrugator\multipleSeriesWeb
exec perl\BaseLibs\ShutApp("Self Destruction System")
exec perl\LRP\BaseLibs\close-MMM
exec perl\LRP\SetupReq\testird_LRP("LRP")
exec perl\BaseLibs\launch_client("LRP")
exec perl\LRP\LRP-classic-4.14\churrip\chorSingle
exec perl\LRP\BaseLibs\setupLRPMMMTab
exec perl\LRP\BaseLibs\launchMMM
exec perl\LRP\BaseLibs\launchLRPCHURRTA("TYRE")
#PAUSE Expand Churrip tree view & open all nodes
exec perl\LRP\LRP-classic-4.14\Corrugator\multipleSeriesWeb
exec perl\BaseLibs\ShutApp("Self Destruction System")
exec perl\LRP\BaseLibs\close-MMM

我尝试在网上搜索我的答案,但没能找到满足我需求的答案。

给定 File-1 和 File-2 这是我希望我的脚本输出的内容 (我已经列出了 FILE-1 中每一行的预期输出)

For line "1.20/abc/this_is_test_1" in FILE-1
Output
exec 1.20/setup/testinit
exec 1.20/abc/this_is_test_1
exec 1.20/abc/this_is_test_1

For line "perl/RRP/RRP-1.30/JEDI/JEDIExportSuccess2" in FILE-1
Output
exec perl/RRP/SetupReq/testdef_ijk
exec perl/RRP/RRP-1.30/JEDI/SetupReq/confAbvExp
exec perl/RRP/RRP-1.30/JEDI/JEDIExportSuccess2

For line "exec perl/RRP/RRP-1.30/JEDI/CommonReq/confAbvExp" in FILE-1
Output
do nothing as there is no line matching this is in FILE-2

For line "perl/LRP/BaseLibs/close-MMM" in FILE-1
Output
exec perl/LRP/SetupReq/testird_LRP("LRP")
exec perl/BaseLibs/launch_client("LRP")
exec perl/LRP/LRP-classic-4.14/churrip/chorSingle
exec perl/LRP/BaseLibs/setupLRPMMMTab
exec perl/LRP/BaseLibs/launchMMM
exec perl/LRP/BaseLibs/launchLRPCHURRTA("TYRE")
#PAUSE Expand Churrip tree view & open all nodes
exec perl/LRP/LRP-classic-4.14/Corrugator/multipleSeriesWeb
exec perl/BaseLibs/ShutApp("Self Destruction System")
exec perl/LRP/BaseLibs/close-MMM    

For line "exec perl/LRP/BaseLibs/launchLRPCHURRTA("TYRE")" in FILE-1
Output
Do nothing as it would generate the same black as line "perl/LRP/BaseLibs/close-MMM" in FILE-1 did

For Line "this/or/that" in FILE-1
Output
Do nothing as there is no line matching this is in FILE-2

所以我的最终输出应该类似于( block 的顺序无关紧要)

exec 1.20/setup/testinit
exec 1.20/abc/this_is_test_1
exec 1.20/abc/this_is_test_1

exec perl/RRP/SetupReq/testdef_ijk
exec perl/RRP/RRP-1.30/JEDI/SetupReq/confAbvExp
exec perl/RRP/RRP-1.30/JEDI/JEDIExportSuccess2

exec perl/LRP/SetupReq/testird_LRP("LRP")
exec perl/BaseLibs/launch_client("LRP")
exec perl/LRP/LRP-classic-4.14/churrip/chorSingle
exec perl/LRP/BaseLibs/setupLRPMMMTab
exec perl/LRP/BaseLibs/launchMMM
exec perl/LRP/BaseLibs/launchLRPCHURRTA("TYRE")
#PAUSE Expand Churrip tree view & open all nodes
exec perl/LRP/LRP-classic-4.14/Corrugator/multipleSeriesWeb
exec perl/BaseLibs/ShutApp("Self Destruction System")
exec perl/LRP/BaseLibs/close-MMM

如果有人能给我一些关于如何进行的指示,那就太好了。是的,我忘了说,这不是作业问题:-)。

非常感谢

最佳答案

假设行顺序无关紧要,您可以通过以下方式从文件中删除重复项,在命令提示符下:

sort filename | uniq

为了找出两个文件中都存在哪些行,我使用了一个创建散列(或关联数组,如果你愿意的话)的 perl 脚本。然后我扫描文件 A,将每一行添加到散列中,使用行作为键,并将值设置为 1。然后我对文件 A 执行相同操作,但将值设置为 2,如果键已经存在,我改为添加 2。结果将只遍历每个文件一次,最后我知道如果键值为 1,则它仅存在于文件 A 中,如果值为 2,则它仅存在于文件 B 中,并且如果它的值为 3,则它存在于两者中。

编辑: 我从一个项目中找到了一些 perl 代码,它们完全按照我上面描述的方式进行。在这段代码中,我只是在寻找差异,但应该很容易根据您的需要修改它

my %found;
foreach my $item (@qlist) { $found{$item} += 2 };
foreach my $item (@xlist) { $found{$item} += 1 };

foreach my $found (keys(%found))
{
  if    ($found{$found} == 3)
  {
    # It's in both files. Not doing anything.
  }
  elsif ($found{$found} == 2)
  {
    print "$found found in the QC-list, but not the x-list.\n";
  }
  elsif ($found{$found} == 1)
  {
    print "$found found in the x-list, but not the QC-list.\n";
  }
}

关于linux - 使用 shell 脚本从文件中提取唯一的行 block ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13713032/

相关文章:

linux - Grep - 返回文件的行号和名称

linux - 在 Linux 中获取数据段的结尾

regex - Bash 脚本从 csv 中提取数据

bash - 执行 bash 然后在 docker 中运行命令

macos - -bash : : command not found when launching the shell in Mac OSX

linux - shell 脚本输出中的语言正在发生变化

linux - 没有 X 的硬件加速

bash - 用另一个文件中的行替换文件中的几行

java - 为什么我可以直接从 bash 执行 JAR?

java - 如何在 Rooted android 设备中使用 shell 命令读取短信?