performance - 使用嵌套 for 循环的 Perl 脚本性能缓慢

我有一个很大的 FASTA 文件(一个基因序列，一整条染色体)，其中每行包含 50 个字符(碱基 a、g、t 和 c)。这个文件中大约有 400 万行。

我想重新组织文件，以便一行中的每个字符都放在新文件的自己的行中。也就是说，将原始文件中每行 50 个字符的行转换为 50 个单字符行。这将导致整个序列重写为单个列。最终，我希望将序列作为单列，这样我就可以放置一个相邻的列，其中包含每个碱基的基因组坐标位置。

我就是这样做的，使用 perl 并创建一组 for 循环。

unless(@ARGV) {
    # $0 name of the program being executed;
    print "\n usage: $0 filename\n\n"; 
    exit;
}

# use shift to pull off @ARGV value and return to $list;
my $fastafile = shift; 
open(FASTA, "<$fastafile");
my @count =(<FASTA>);
close FASTA;

# print scalar @count;

for ( my $i = 0; $i < scalar @count ; $i ++ ) {

#print "$count[$i]\n\n\n\n"; 
my @seq  = split( "", $count[ $i ] ); 
print " line = $i ";
for ( my $j = 0; $j < scalar @seq; $j++ ){
    #my $count =
    print "$seq[$j]  for count = $j \n"; 

    }

}

它似乎在工作，但速度很慢，非常慢。我想知道速度慢是因为 FASTA 文件有 400 万行，还是因为我的代码，或者两者兼而有之。我正在寻求建议以加快此过程。谢谢!

最佳答案

问题是您正在吞噬文件。当这个巨大的文件被吞噬时，进程会等到所有 I/O 结束后才开始处理。一个选项是逐行处理文件:

open my $fh, '<', $fastafile or die "Error opening file: $!";

while ( my $line = <$fh> ) {
    chomp $line;    # Remove the newline from the end of each line

    my @seq = split //, $line;

    # Loop from 0 to the last index of @seq
    for my $i ( 0 .. $#seq ) {
        print "$seq[$i] for count = $i\n";
    }
}

关于performance - 使用嵌套 for 循环的 Perl 脚本性能缓慢，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20850667/

performance - 使用嵌套 for 循环的 Perl 脚本性能缓慢

上一篇：amazon-web-services - AWS DMS 连续复制问题

下一篇：symfony - 将CKeditor集成到我的symfony2项目中