string - 如何快速计算字符串中连续单个字符的最大数量？

我有一个类似于:但更长的字符串

my $a = "000000001111111111000000011111111111111111111111111111111";

我正在计算“1”的数量:

my $total_1_available = $a =~ tr/1//;

这工作得非常好，而且非常快。
但是，我也希望(以快速的方式)计算连续 1 的总数。连续“1”的最大计数。
在上面的示例中，它将返回以下计数:

11111111111111111111111111111111

因为这是连续的最大值。
所以，我最终得到了 TOTAL_COUNT 和 TOTAL_CONSECUTIVE_COUNT。
我让它与一个 REGEXP 一起工作，它基本上替换了 1，然后计算被替换的内容并循环......实际上完全没问题并且有效......但它“感觉”不对。
理想情况下，我根本不想替换字符串，因为我正在寻找最大连续计数。
但是，我知道在 Perl 中这可能不是最快或最干净的方法。
你能教我更好的方法并增加我的学习吗？
按照要求，这是我当前的代码:

 my $a= "0110011001101111";
 my $total_1_available = $a =~ tr/1//;
 print "Total number of 1's = $total_1_available\n";

 my $max_c = 0;
 while ( $a=~s/(1+)/ / ) {
   $max_c = length($1) if length($1) > $max_c;
 }
 print "Consecutive count   = $max_c\n";

和最终代码:

use strict;
use warnings;
use Benchmark ':all';
use String::Random;

## We test 525,600 as this is the length of the string.
## Actually each 0 or 1 represents a minute of the year.
## And these represent engineer minues available in a 24 hr / 365 day year.
## And there are lots and lots of engineers.
## Hence my wish to improve the performance and I wish to thank everyone whom responded.

## there are a lot more 0's than 1's so hack to sort of simulate
my $test_regex = '[0][0][0][0][0][0-1][0-1][0-1][0-1][0-1]' x 52560;
my $pass       = String::Random->new;
my $string     = $pass->randregex($test_regex);

cmpthese(-1, {
    org  => sub { my $max = 0; while ($string =~ /(1+)/g) { my $len = length($1); if ($max < $len) { $max = $len } } },
    hack => sub { my $match = ""; while ($string =~ /(${match}1+)/g) { $match = $1; } length $match }
});

#                BLOWN AWAY !!!!!!
#                BLOWN AWAY !!!!!!
#                BLOWN AWAY !!!!!!
#                BLOWN AWAY !!!!!!

最佳答案

使用动态正则表达式可以显着提高速度。我们可以使用一个变量来存储最大长度的字符串，然后搜索一个那么长的字符串，加上一个或多个。理论是我们只需要寻找比我们已有的字符串更长的字符串。
我使用了一个看起来像这样的解决方案

sub hack {
    my $match = "";                        # original search string
    while ($string =~ /(${match}1+)/g) {   # search for $match plus 1 or more 1s
        $match = $1;                       # when found, change to new match
    }
    length $match;                         # return max length
}

并将其与 OP 描述的原始方法进行了比较，结果如下

use strict;
use warnings;
use Benchmark ':all';

my $string = '0100100101111011010010101101101110101011111111101010100100100001011101010100' x 10_000;

cmpthese(-1, {
    org  => sub { my $max = 0; while ($string =~ /(1+)/g) { my $len = length($1); if ($max < $len) { $max = $len } } },
    hack => sub { my $match = ""; while ($string =~ /(${match}1+)/g) { $match = $1; } length $match }
});

输出:

       Rate    org   hack
org  7.31/s     --   -99%
hack 1372/s 18669%     --

这似乎高得惊人，快了 19000%。这让我觉得我犯了一个错误，但我想不出那会是什么。也许我在正则表达式机器内部遗漏了一些东西，但这将是对原始解决方案的相当大的改进。

关于string - 如何快速计算字符串中连续单个字符的最大数量？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/67319032/

string - 如何快速计算字符串中连续单个字符的最大数量？

上一篇：types - Julia:是否可以将参数字典传递给函数？

下一篇：c++ - 如何创建共享概念的对象 vector ？