c# - C#的string.IndexOf怎么能执行的这么快，比普通的for循环find快10倍？

我有一个很长的字符串(大小为 60MB)，我需要找出其中有多少对“<”和“>”。

我首先尝试了自己的方法:

        char pre = '!';
        int match1 = 0;
        for (int j = 0; j < html.Length; j++)
        {
            char c = html[j];
            if (pre == '<' && c == '>') //find a match
            {
                pre = '!';
                match1++;
            }
            else if (pre == '!' && c == '<')
                pre = '<';
        }

以上代码在我的字符串上运行了大约 1000 毫秒。

然后我尝试使用 string.IndexOf

        int match2 = 0;
        int index = -1;
        do
        {
            index = html.IndexOf('<', index + 1);
            if (index != -1) // find a match
            {
                index = html.IndexOf('>', index + 1);
                if (index != -1)
                   match2++;
            }
        } while (index != -1);

以上代码只运行了大约 150 毫秒。

我想知道是什么让 string.IndexOf 运行这么快？

谁能启发我？

编辑

好的，根据@BrokenGlass的回答。我修改了我的代码，他们不检查配对，而是检查字符串中有多少个“<”。

        for (int j = 0; j < html.Length; j++)
        {
            char c = html[j];
            if (c == '>')
            {
                match1++;
            }
        }

以上代码运行了大约 760 毫秒。

使用 IndexOf

        int index = -1;
        do
        {
            index = html.IndexOf('<', index + 1);
            if (index != -1)
            {
                match2++;
            }
        } while (index != -1);

以上代码运行了大约 132 毫秒。 仍然非常非常快。

编辑2

阅读@Jeffrey Sax 的评论后，我意识到我正在 VS 中以 Debug模式运行。

然后我构建并在 Release模式下运行，好的，IndexOf 仍然更快，但不再那么快了。

结果如下:

对于配对计数:207ms VS 144ms

对于正常的一个字符计数:141ms VS 111ms。

我自己的代码的性能确实得到了提高。

经验教训:当您进行基准测试时，请在 Release模式下进行!

最佳答案

您是否在 Visual Studio 中运行计时？如果是这样，仅此一个原因，您的代码运行速度就会明显变慢。

除此之外，在某种程度上，您是在比较苹果和橙子。这两种算法以不同的方式工作。

IndexOf 版本在查找左括号仅和右括号仅之间交替。您的代码遍历整个字符串并保留一个状态标志，指示它是在寻找左括号还是右括号。这需要更多的工作，预计会更慢。

下面是一些代码，它以与您的 IndexOf 方法相同的方式进行比较。

int match3 = 0;
for (int j = 0; j < html.Length; j++) {
    if (html[j] == '<') {
        for (; j < html.Length; j++)
            if (html[j] == '>')
                match3++;
    }
}

在我的测试中，这实际上比 IndexOf 方法快 3 倍。原因？字符串实际上不像单个字符的序列那么简单。有标记、重音等。String.IndexOf 可以正确处理所有这些额外的复杂性，但这是有代价的。

关于c# - C#的string.IndexOf怎么能执行的这么快，比普通的for循环find快10倍？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/10519380/

c# - C#的string.IndexOf怎么能执行的这么快，比普通的for循环find快10倍？

上一篇：C# 在 Process.Kill() 期间仅完成了 ReadProcessMemory 或 WriteProcessMemory 请求的一部分

下一篇：c# - 如何使用 Moq 创建 stub