c# - C# 中的字符串基准 - 重构速度/可维护性

标签 c# string refactoring

我一直在业余时间修补小函数,试图找到重构它们的方法(我最近读了 Martin Fowler 的书 Refactoring: Improving the Design of Existing Code )。我在更新它附近的代码库的另一部分时发现了以下函数 MakeNiceString(),它看起来是一个很好的候选对象。实际上,没有真正的理由要替换它,但它足够小并且做的事情很小,因此很容易理解,但仍然可以从中获得“良好”的体验。

private static string MakeNiceString(string str)
        {
            char[] ca = str.ToCharArray();
            string result = null;
            int i = 0;
            result += System.Convert.ToString(ca[0]);
            for (i = 1; i <= ca.Length - 1; i++)
            {
                if (!(char.IsLower(ca[i])))
                {
                    result += " ";
                }
                result += System.Convert.ToString(ca[i]);
            }
            return result;
        }


static string SplitCamelCase(string str)
    {
        string[] temp = Regex.Split(str, @"(?<!^)(?=[A-Z])");
        string result = String.Join(" ", temp);
        return result;
    }

第一个函数 MakeNiceString() 是我在工作中更新的一些代码中找到的函数。该函数的目的是将 ThisIsAString 转换为 This Is A String。它在代码中的六个地方使用,并且在整个计划中是微不足道的。

我构建第二个函数纯粹是作为一个学术练习,看看使用正则表达式是否会花费更长的时间。

好了,这是结果:

10 次迭代:

MakeNiceString took 2649 ticks
SplitCamelCase took 2502 ticks

但是,从长远来看,它会发生巨大变化:

10,000 次迭代:

MakeNiceString took 121625 ticks
SplitCamelCase took 443001 ticks

Refactoring MakeNiceString()

The process of refactoring MakeNiceString() started with simply removing the conversions that were taking place. Doing that yielded the following results:

MakeNiceString took 124716 ticks
ImprovedMakeNiceString took 118486

Here's the code after Refactor #1:

private static string ImprovedMakeNiceString(string str)
        { //Removed Convert.ToString()
            char[] ca = str.ToCharArray();
            string result = null;
            int i = 0;
            result += ca[0];
            for (i = 1; i <= ca.Length - 1; i++)
            {
                if (!(char.IsLower(ca[i])))
                {
                    result += " ";
                }
                result += ca[i];
            }
            return result;
        }

重构#2 - 使用StringBuilder

My second task was to use StringBuilder instead of String. Since String is immutable, unnecessary copies were being created throughout the loop. The benchmark for using that is below, as is the code:

static string RefactoredMakeNiceString(string str)
        {
            char[] ca = str.ToCharArray();
            StringBuilder sb = new StringBuilder((str.Length * 5 / 4));
            int i = 0;
            sb.Append(ca[0]);
            for (i = 1; i <= ca.Length - 1; i++)
            {
                if (!(char.IsLower(ca[i])))
                {
                    sb.Append(" ");
                }
                sb.Append(ca[i]);
            }
            return sb.ToString();
        }

这导致以下基准:

MakeNiceString Took:           124497 Ticks   //Original
SplitCamelCase Took:           464459 Ticks   //Regex
ImprovedMakeNiceString Took:   117369 Ticks   //Remove Conversion
RefactoredMakeNiceString Took:  38542 Ticks   //Using StringBuilder

for 循环更改为 foreach 循环会产生以下基准测试结果:

static string RefactoredForEachMakeNiceString(string str)
        {
            char[] ca = str.ToCharArray();
            StringBuilder sb1 = new StringBuilder((str.Length * 5 / 4));
            sb1.Append(ca[0]);
            foreach (char c in ca)
            {
                if (!(char.IsLower(c)))
                {
                    sb1.Append(" ");
                }
                sb1.Append(c);
            }
            return sb1.ToString();
        }
RefactoredForEachMakeNiceString    Took:  45163 Ticks

如您所见,在维护方面,foreach 循环将是最容易维护的并且具有“最干净”的外观。它比 for 循环稍慢,但更容易理解。

替代重构:使用已编译的 Regex

我将 Regex 移到循环开始之前,希望因为它只编译一次,所以执行得更快。我发现(而且我确信我在某处有一个错误)是这并没有像它应该发生的那样发生:

static void runTest5()
        {
            Regex rg = new Regex(@"(?<!^)(?=[A-Z])", RegexOptions.Compiled);
            for (int i = 0; i < 10000; i++)
            {
                CompiledRegex(rg, myString);
            }
        }
 static string CompiledRegex(Regex regex, string str)
    {
        string result = null;
        Regex rg1 = regex;
        string[] temp = rg1.Split(str);
        result = String.Join(" ", temp);
        return result;
    }

最终基准测试结果:

MakeNiceString Took                   139363 Ticks
SplitCamelCase Took                   489174 Ticks
ImprovedMakeNiceString Took           115478 Ticks
RefactoredMakeNiceString Took          38819 Ticks
RefactoredForEachMakeNiceString Took   44700 Ticks
CompiledRegex Took                    227021 Ticks

Or, if you prefer milliseconds:

MakeNiceString Took                  38 ms
SplitCamelCase Took                 123 ms
ImprovedMakeNiceString Took          33 ms
RefactoredMakeNiceString Took        11 ms
RefactoredForEachMakeNiceString Took 12 ms
CompiledRegex Took                   63 ms

So the percentage gains are:

MakeNiceString                   38 ms   Baseline
SplitCamelCase                  123 ms   223% slower
ImprovedMakeNiceString           33 ms   13.15% faster
RefactoredMakeNiceString         11 ms   71.05% faster
RefactoredForEachMakeNiceString  12 ms   68.42% faster
CompiledRegex                    63 ms   65.79% slower

(Please check my math)

In the end, I'm going to replace what's there with the RefactoredForEachMakeNiceString() and while I'm at it, I'm going to rename it to something useful, like SplitStringOnUpperCase.

Benchmark Test:

To benchmark, I simply invoke a new Stopwatch for each method call:

       string myString = "ThisIsAUpperCaseString";
       Stopwatch sw = new Stopwatch();
       sw.Start();
       runTest();
       sw.Stop();

     static void runTest()
        {

            for (int i = 0; i < 10000; i++)
            {
                MakeNiceString(myString);
            }


        }

问题

  • 是什么导致这些功能“从长远来看”如此不同,以及
  • 我怎样才能改进这个功能 a)更易于维护或 b) 跑得更快?
  • 我如何对这些进行内存基准测试以查看哪个使用的内存更少?

感谢您到目前为止的回复。我已经插入了@Jon Skeet 提出的所有建议,并希望就我因此提出的更新问题提供反馈。

NB: This question is meant to explore ways to refactor string handling functions in C#. I copied/pasted the first code as is. I'm well aware you can remove the System.Convert.ToString() in the first method, and I did just that. If anyone is aware of any implications of removing the System.Convert.ToString(), that would also be helpful to know.

最佳答案

1) 使用 StringBuilder,最好设置合理的初始容量(例如字符串长度 * 5/4,允许每四个字符多一个空格)。

2) 尝试使用 foreach 循环而不是 for 循环 - 它可能更简单

3) 您不需要先将字符串转换为 char 数组 - foreach 已经可以处理字符串,或者使用索引器。

4) 不要在任何地方进行额外的字符串转换——调用 Convert.ToString(char) 然后附加该字符串是没有意义的;不需要单字符串

5) 对于第二个选项,只需在方法外构建正则表达式一次。尝试使用 RegexOptions.Compiled 也是如此。

编辑:好的,完整的基准测试结果。我已经尝试了更多的东西,并且还执行了更多迭代的代码以获得更准确的结果。这只在 Eee PC 上运行,所以毫无疑问它会在“真实”PC 上运行得更快,但我怀疑广泛的结果是合适的。首先是代码:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Reflection;
using System.Text;
using System.Text.RegularExpressions;

class Benchmark
{
    const string TestData = "ThisIsAUpperCaseString";
    const string ValidResult = "This Is A Upper Case String";
    const int Iterations = 1000000;

    static void Main(string[] args)
    {
        Test(BenchmarkOverhead);
        Test(MakeNiceString);
        Test(ImprovedMakeNiceString);
        Test(RefactoredMakeNiceString);
        Test(MakeNiceStringWithStringIndexer);
        Test(MakeNiceStringWithForeach);
        Test(MakeNiceStringWithForeachAndLinqSkip);
        Test(MakeNiceStringWithForeachAndCustomSkip);
        Test(SplitCamelCase);
        Test(SplitCamelCaseCachedRegex);
        Test(SplitCamelCaseCompiledRegex);        
    }

    static void Test(Func<string,string> function)
    {
        Console.Write("{0}... ", function.Method.Name);
        Stopwatch sw = Stopwatch.StartNew();
        for (int i=0; i < Iterations; i++)
        {
            string result = function(TestData);
            if (result.Length != ValidResult.Length)
            {
                throw new Exception("Bad result: " + result);
            }
        }
        sw.Stop();
        Console.WriteLine(" {0}ms", sw.ElapsedMilliseconds);
        GC.Collect();
    }

    private static string BenchmarkOverhead(string str)
    {
        return ValidResult;
    }

    private static string MakeNiceString(string str)
    {
        char[] ca = str.ToCharArray();
        string result = null;
        int i = 0;
        result += System.Convert.ToString(ca[0]);
        for (i = 1; i <= ca.Length - 1; i++)
        {
            if (!(char.IsLower(ca[i])))
            {
                result += " ";
            }
            result += System.Convert.ToString(ca[i]);
        }
        return result;
    }

    private static string ImprovedMakeNiceString(string str)
    { //Removed Convert.ToString()
        char[] ca = str.ToCharArray();
        string result = null;
        int i = 0;
        result += ca[0];
        for (i = 1; i <= ca.Length - 1; i++)
        {
            if (!(char.IsLower(ca[i])))
            {
                result += " ";
            }
            result += ca[i];
        }
        return result;
    }

    private static string RefactoredMakeNiceString(string str)
    {
        char[] ca = str.ToCharArray();
        StringBuilder sb = new StringBuilder((str.Length * 5 / 4));
        int i = 0;
        sb.Append(ca[0]);
        for (i = 1; i <= ca.Length - 1; i++)
        {
            if (!(char.IsLower(ca[i])))
            {
                sb.Append(" ");
            }
            sb.Append(ca[i]);
        }
        return sb.ToString();
    }

    private static string MakeNiceStringWithStringIndexer(string str)
    {
        StringBuilder sb = new StringBuilder((str.Length * 5 / 4));
        sb.Append(str[0]);
        for (int i = 1; i < str.Length; i++)
        {
            char c = str[i];
            if (!(char.IsLower(c)))
            {
                sb.Append(" ");
            }
            sb.Append(c);
        }
        return sb.ToString();
    }

    private static string MakeNiceStringWithForeach(string str)
    {
        StringBuilder sb = new StringBuilder(str.Length * 5 / 4);
        bool first = true;      
        foreach (char c in str)
        {
            if (!first && char.IsUpper(c))
            {
                sb.Append(" ");
            }
            sb.Append(c);
            first = false;
        }
        return sb.ToString();
    }

    private static string MakeNiceStringWithForeachAndLinqSkip(string str)
    {
        StringBuilder sb = new StringBuilder(str.Length * 5 / 4);
        sb.Append(str[0]);
        foreach (char c in str.Skip(1))
        {
            if (char.IsUpper(c))
            {
                sb.Append(" ");
            }
            sb.Append(c);
        }
        return sb.ToString();
    }

    private static string MakeNiceStringWithForeachAndCustomSkip(string str)
    {
        StringBuilder sb = new StringBuilder(str.Length * 5 / 4);
        sb.Append(str[0]);
        foreach (char c in new SkipEnumerable<char>(str, 1))
        {
            if (char.IsUpper(c))
            {
                sb.Append(" ");
            }
            sb.Append(c);
        }
        return sb.ToString();
    }

    private static string SplitCamelCase(string str)
    {
        string[] temp = Regex.Split(str, @"(?<!^)(?=[A-Z])");
        string result = String.Join(" ", temp);
        return result;
    }

    private static readonly Regex CachedRegex = new Regex("(?<!^)(?=[A-Z])");    
    private static string SplitCamelCaseCachedRegex(string str)
    {
        string[] temp = CachedRegex.Split(str);
        string result = String.Join(" ", temp);
        return result;
    }

    private static readonly Regex CompiledRegex =
        new Regex("(?<!^)(?=[A-Z])", RegexOptions.Compiled);    
    private static string SplitCamelCaseCompiledRegex(string str)
    {
        string[] temp = CompiledRegex.Split(str);
        string result = String.Join(" ", temp);
        return result;
    }

    private class SkipEnumerable<T> : IEnumerable<T>
    {
        private readonly IEnumerable<T> original;
        private readonly int skip;

        public SkipEnumerable(IEnumerable<T> original, int skip)
        {
            this.original = original;
            this.skip = skip;
        }

        public IEnumerator<T> GetEnumerator()
        {
            IEnumerator<T> ret = original.GetEnumerator();
            for (int i=0; i < skip; i++)
            {
                ret.MoveNext();
            }
            return ret;
        }

        IEnumerator IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }
    }
}

现在结果:

BenchmarkOverhead...  22ms
MakeNiceString...  10062ms
ImprovedMakeNiceString...  12367ms
RefactoredMakeNiceString...  3489ms
MakeNiceStringWithStringIndexer...  3115ms
MakeNiceStringWithForeach...  3292ms
MakeNiceStringWithForeachAndLinqSkip...  5702ms
MakeNiceStringWithForeachAndCustomSkip...  4490ms
SplitCamelCase...  68267ms
SplitCamelCaseCachedRegex...  52529ms
SplitCamelCaseCompiledRegex...  26806ms

如您所见,字符串索引器版本是赢家 - 它的代码也非常简单。

希望这对您有所帮助...别忘了,肯定还有其他我没有想到的选择!

关于c# - C# 中的字符串基准 - 重构速度/可维护性,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/473087/

相关文章:

java - 在这种情况下如何重构或修复循环引用? java

c# - 在订阅 Azure 应用程序配置中的更改的 Web API 中实现 Azure 事件网格事件处理程序

c# - 删除完整目录名称的一部分?

c# - 这里解释一下线程的执行顺序?

jquery - 如何制作 jQuery 插件(正确的方法)?

ruby - 我怎样才能更多地重构这个 ruby​​ 代码?

c# - 使用基类 IEqualityComparer 执行 Distinct(),并且仍然返回子类类型?

string - 如何在Python中将字符串编码为字节?

c++ - 如何按升序编号输入?

c# - 使用 Replace() 从开头删除引号会将它们全部删除