regex - 如何打印字符串模式的所有实例的接下来的 N 行？

我有一个如下所示的文件:

文件

variableStep chrom=chr1 span=25
10076   0.84
10101   1
10126   1
10151   1
10176   1
10201   1
10226   1.72
variableStep chrom=chr1 span=25
10251   2
10276   1.16
10301   1
10326   1
10351   1
10376   1
10401   1
10426   0.28
11451   0.04
variableStep chrom=chr2 span=25
9781451     2
19781476    2
19781501    2
19781526    2
19781551    1
19781576    1
19781601    0.48
variableStep chrom=chr2 span=25
19781826    0.28
19781851    1
19781876    1
19781901    1
19781926    1
19781951    1.48
19781976    3.68
19782001    4.56
19782026    4
variableStep chrom=chr3 span=25
4813476 1
24813501    1
24813526    1
24813551    1
24813576    1.88
24813601    2
variableStep chrom=chr3 span=25
24813626    1.4
24813651    1.48
24813676    2
24813701    2
24813726    2
24813751    2
variableStep chrom=chr4 span=25
24815401    2.24
24815426    3
24815451    3
24815476    3
24815501    3
24815526    2.04
variableStep chrom=chr4 span=25
24815551    2
24815576    1.76
24815601    0.76
24815951    0.48
24815976    1
24816001    1
24816026    1
24816051    1
variableStep chrom=chr5 span=25
24817226    0.92
24817251    1.48
24817276    3
24817301    3
variableStep chrom=chr5 span=25
24817326    3
24817351    3
24817376    3
24817401    3.04
24817426    3.08

需要什么

我需要做的是，对于所有 variableStep chrom=chr1 span=25 实例，将后续的 n 行打印到输出文件中。 n 我必须提到，它是高度可变的。实际文件中的值可能有 300,000 到 500,000 以上的差异。

所需输出

1.Output_file_1_for_variableStep chrom=chr1 span=25

10076   0.84
10101   1
10126   1
10151   1
10176   1
10201   1
10226   1.72
10251   2
10276   1.16
10301   1
10326   1
10351   1
10376   1
10401   1
10426   0.28
11451   0.04

2._Output_file_2_for_variableStep chrom=chr2 span=25

9781451     2
19781476    2
19781501    2
19781526    2
19781551    1
19781576    1
19781601    0.48
19781826    0.28
19781851    1
19781876    1
19781901    1
19781926    1
19781951    1.48
19781976    3.68
19782001    4.56
19782026    4

3._Output_file_3_for_variableStep chrom=chr3 span=25

4813476     1
24813501    1
24813526    1
24813551    1
24813576    1.88
24813601    2
24813626    1.4
24813651    1.48
24813676    2
24813701    2
24813726    2
24813751    2

4._Output_file_4_for_variableStep chrom=chr4 span=25

24815401    2.24
24815426    3
24815451    3
24815476    3
24815501    3
24815526    2.04
24815551    2
24815576    1.76
24815601    0.76
24815951    0.48
24815976    1
24816001    1
24816026    1
24816051    1

5._Output_file_5_for_variableStep chrom=chr5 span=25

24817226    0.92
24817251    1.48
24817276    3
24817301    3
24817326    3
24817351    3
24817376    3
24817401    3.04
24817426    3.08

背景
我仍然认为自己是 Perl 新手，所以我编写的代码并不能完全完成任务。

事实上，下面的代码描述了我试图让它工作的 3 种方法。对于具有模式 variableStep chrom=chr1 span=25 的代码，我尝试手动打印正则表达式匹配后的后续行。

据我所知，我需要一个循环来运行所有后续行，这就是我使用模式 variableStep chrom=chr1 span=25 编写的内容。但后来，我意识到我需要一个退出机制，否则所有后续行都会被打印。

这是这个退出模式写成last if/^v.*$/我需要弄清楚。因为我目前只打印特定模式的第一个实例。也没有可以退出的空行。如果我有一个空行，这段代码工作得很好(修改为 last if/^$/ )。我什至尝试使用非十进制字符作为 /^\D.*$/，但它不起作用。 我应该使用什么退出模式？

代码的其余部分是我的宝贝尝试让程序运行，它只打印模式匹配后的单个后续行。

代码

#Trial code to parse main file
use 5.014;
use warnings;

#Assign filename
my $file = 'trial.txt';

#Open filename
open my $fh, '<' , $file || die $!;

#Open output
open OUT1, ">Trial_chr1.out" || die $!;
open OUT2, ">Trial_chr2.out" || die $!;
open OUT3, ">Trial_chr3.out" || die $!;
open OUT4, ">Trial_chr4.out" || die $!;
open out5, ">Trial_chr5.out" || die $!;

#Read in file
while(<$fh>){
    chomp;
    if (/^variableStep chrom=chr1 span=25/){

        my $nextline1 = <$fh>;#means next line after pattern match
        my $nextline2 = <$fh>;
        my $nextline3 = <$fh>;
        my $nextline4 = <$fh>;
        my $nextline5 = <$fh>;
        my $nextline6 = <$fh>;
        my $nextline7 = <$fh>;
        print OUT1 $nextline1;
        print OUT1 $nextline2;
        print OUT1 $nextline3;
        print OUT1 $nextline4;
        print OUT1 $nextline5;
        print OUT1 $nextline6;
        print OUT1 $nextline7;

    }elsif(/^variableStep chrom=chr2 span=25/){

        my @grabbed_lines; #Initialize array to store lines after pattern match
        while (<$fh>){ #Read subsequent lines while in a loop

        last if /^v.*$/; #Break out of the loop if line encountered begins with v
        push @grabbed_lines, $_;# As long as the above condition is false, push the lines into the array

        }print OUT2 @grabbed_lines; # Print the grabbed lines

    }elsif(/^variableStep chrom=chr3 span=25/){
        my $nextline = <$fh>;
        print OUT3 $nextline;

    }elsif(/^variableStep chrom=chr4 span=25/){
        my $nextline = <$fh>;
        print OUT4 $nextline;
    }elsif(/^variableStep chrom=chr5 span=25/){
        my $nextline = <$fh>;
        print out5 $nextline;
    }
}


#Exit
exit;

感谢您抽出宝贵时间来解决我的问题。如果您有任何提示和建议，我将不胜感激。

最佳答案

好吧，我误解了 n部分，每场比赛都不同，这是经过测试并且有效的:

my $found = 0;

while (<$fh>) {
    if ( $found && /^\d/ ) {
        print $_;
    }
    else {
        $found = 0;
    }

    if (/^variableStep chrom=chr2 span=25/) {
        $found = 1;
    }
}

这样它会打印所有以数字开头的后续行。

说明:

这里的问题是，每次你调用<$fh>时它读取下一行，因此如果您测试该行内容并且测试失败，则不应执行下一个循环，因为随后会读取下一行，并且您会丢失测试失败的行。

所以我想到了这个解决方案:

我使用一个标志来知道我处于哪种模式，我是否正在搜索要打印的行？
第一个if仅输入
1. 如果我之前已经在循环中的第二个 if if 中并且该标志已设置为“1”
2. 并且该行以数字开头。
当此测试失败时，即开头没有数字的行，我会重置标志，然后有机会再次查看同一行(如果它以“variableStep ...”开头)

关于regex - 如何打印字符串模式的所有实例的接下来的 N 行？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/15129713/

regex - 如何打印字符串模式的所有实例的接下来的 N 行？

上一篇：printing - 如何安装pscript/unidrv渲染插件？ DDK/WDK print/oemdll 样本表现得很奇怪

下一篇：macos - 如何授予 WWW::Mechanize::Firefox 启动 Firefox 的权限？