我一直在业余时间修补小函数,试图找到重构它们的方法(我最近读了 Martin Fowler 的书 Refactoring: Improving the Design of Existing Code )。我在更新它附近的代码库的另一部分时发现了以下函数 MakeNiceString()
,它看起来是一个很好的候选对象。实际上,没有真正的理由要替换它,但它足够小并且做的事情很小,因此很容易理解,但仍然可以从中获得“良好”的体验。
private static string MakeNiceString(string str)
{
char[] ca = str.ToCharArray();
string result = null;
int i = 0;
result += System.Convert.ToString(ca[0]);
for (i = 1; i <= ca.Length - 1; i++)
{
if (!(char.IsLower(ca[i])))
{
result += " ";
}
result += System.Convert.ToString(ca[i]);
}
return result;
}
static string SplitCamelCase(string str)
{
string[] temp = Regex.Split(str, @"(?<!^)(?=[A-Z])");
string result = String.Join(" ", temp);
return result;
}
第一个函数 MakeNiceString()
是我在工作中更新的一些代码中找到的函数。该函数的目的是将 ThisIsAString 转换为 This Is A String。它在代码中的六个地方使用,并且在整个计划中是微不足道的。
我构建第二个函数纯粹是作为一个学术练习,看看使用正则表达式是否会花费更长的时间。
好了,这是结果:
10 次迭代:
MakeNiceString took 2649 ticks SplitCamelCase took 2502 ticks
但是,从长远来看,它会发生巨大变化:
10,000 次迭代:
MakeNiceString took 121625 ticks SplitCamelCase took 443001 ticks
Refactoring MakeNiceString()
The process of refactoring
MakeNiceString()
started with simply removing the conversions that were taking place. Doing that yielded the following results:
MakeNiceString took 124716 ticks ImprovedMakeNiceString took 118486
Here's the code after Refactor #1:
private static string ImprovedMakeNiceString(string str)
{ //Removed Convert.ToString()
char[] ca = str.ToCharArray();
string result = null;
int i = 0;
result += ca[0];
for (i = 1; i <= ca.Length - 1; i++)
{
if (!(char.IsLower(ca[i])))
{
result += " ";
}
result += ca[i];
}
return result;
}
重构#2 - 使用StringBuilder
My second task was to use
StringBuilder
instead ofString
. SinceString
is immutable, unnecessary copies were being created throughout the loop. The benchmark for using that is below, as is the code:
static string RefactoredMakeNiceString(string str)
{
char[] ca = str.ToCharArray();
StringBuilder sb = new StringBuilder((str.Length * 5 / 4));
int i = 0;
sb.Append(ca[0]);
for (i = 1; i <= ca.Length - 1; i++)
{
if (!(char.IsLower(ca[i])))
{
sb.Append(" ");
}
sb.Append(ca[i]);
}
return sb.ToString();
}
这导致以下基准:
MakeNiceString Took: 124497 Ticks //Original SplitCamelCase Took: 464459 Ticks //Regex ImprovedMakeNiceString Took: 117369 Ticks //Remove Conversion RefactoredMakeNiceString Took: 38542 Ticks //Using StringBuilder
将 for
循环更改为 foreach
循环会产生以下基准测试结果:
static string RefactoredForEachMakeNiceString(string str)
{
char[] ca = str.ToCharArray();
StringBuilder sb1 = new StringBuilder((str.Length * 5 / 4));
sb1.Append(ca[0]);
foreach (char c in ca)
{
if (!(char.IsLower(c)))
{
sb1.Append(" ");
}
sb1.Append(c);
}
return sb1.ToString();
}
RefactoredForEachMakeNiceString Took: 45163 Ticks
如您所见,在维护方面,foreach
循环将是最容易维护的并且具有“最干净”的外观。它比 for
循环稍慢,但更容易理解。
替代重构:使用已编译的 Regex
我将 Regex 移到循环开始之前,希望因为它只编译一次,所以执行得更快。我发现(而且我确信我在某处有一个错误)是这并没有像它应该发生的那样发生:
static void runTest5()
{
Regex rg = new Regex(@"(?<!^)(?=[A-Z])", RegexOptions.Compiled);
for (int i = 0; i < 10000; i++)
{
CompiledRegex(rg, myString);
}
}
static string CompiledRegex(Regex regex, string str)
{
string result = null;
Regex rg1 = regex;
string[] temp = rg1.Split(str);
result = String.Join(" ", temp);
return result;
}
最终基准测试结果:
MakeNiceString Took 139363 Ticks SplitCamelCase Took 489174 Ticks ImprovedMakeNiceString Took 115478 Ticks RefactoredMakeNiceString Took 38819 Ticks RefactoredForEachMakeNiceString Took 44700 Ticks CompiledRegex Took 227021 Ticks
Or, if you prefer milliseconds:
MakeNiceString Took 38 ms SplitCamelCase Took 123 ms ImprovedMakeNiceString Took 33 ms RefactoredMakeNiceString Took 11 ms RefactoredForEachMakeNiceString Took 12 ms CompiledRegex Took 63 ms
So the percentage gains are:
MakeNiceString 38 ms Baseline SplitCamelCase 123 ms 223% slower ImprovedMakeNiceString 33 ms 13.15% faster RefactoredMakeNiceString 11 ms 71.05% faster RefactoredForEachMakeNiceString 12 ms 68.42% faster CompiledRegex 63 ms 65.79% slower
(Please check my math)
In the end, I'm going to replace what's there with the RefactoredForEachMakeNiceString()
and while I'm at it, I'm going to rename it to something useful, like SplitStringOnUpperCase
.
Benchmark Test:
To benchmark, I simply invoke a new Stopwatch
for each method call:
string myString = "ThisIsAUpperCaseString";
Stopwatch sw = new Stopwatch();
sw.Start();
runTest();
sw.Stop();
static void runTest()
{
for (int i = 0; i < 10000; i++)
{
MakeNiceString(myString);
}
}
问题
- 是什么导致这些功能“从长远来看”如此不同,以及
- 我怎样才能改进这个功能 a)更易于维护或 b) 跑得更快?
- 我如何对这些进行内存基准测试以查看哪个使用的内存更少?
感谢您到目前为止的回复。我已经插入了@Jon Skeet 提出的所有建议,并希望就我因此提出的更新问题提供反馈。
NB: This question is meant to explore ways to refactor string handling functions in C#. I copied/pasted the first code
as is
. I'm well aware you can remove theSystem.Convert.ToString()
in the first method, and I did just that. If anyone is aware of any implications of removing theSystem.Convert.ToString()
, that would also be helpful to know.
最佳答案
1) 使用 StringBuilder,最好设置合理的初始容量(例如字符串长度 * 5/4,允许每四个字符多一个空格)。
2) 尝试使用 foreach 循环而不是 for 循环 - 它可能更简单
3) 您不需要先将字符串转换为 char 数组 - foreach 已经可以处理字符串,或者使用索引器。
4) 不要在任何地方进行额外的字符串转换——调用 Convert.ToString(char) 然后附加该字符串是没有意义的;不需要单字符串
5) 对于第二个选项,只需在方法外构建正则表达式一次。尝试使用 RegexOptions.Compiled 也是如此。
编辑:好的,完整的基准测试结果。我已经尝试了更多的东西,并且还执行了更多迭代的代码以获得更准确的结果。这只在 Eee PC 上运行,所以毫无疑问它会在“真实”PC 上运行得更快,但我怀疑广泛的结果是合适的。首先是代码:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Reflection;
using System.Text;
using System.Text.RegularExpressions;
class Benchmark
{
const string TestData = "ThisIsAUpperCaseString";
const string ValidResult = "This Is A Upper Case String";
const int Iterations = 1000000;
static void Main(string[] args)
{
Test(BenchmarkOverhead);
Test(MakeNiceString);
Test(ImprovedMakeNiceString);
Test(RefactoredMakeNiceString);
Test(MakeNiceStringWithStringIndexer);
Test(MakeNiceStringWithForeach);
Test(MakeNiceStringWithForeachAndLinqSkip);
Test(MakeNiceStringWithForeachAndCustomSkip);
Test(SplitCamelCase);
Test(SplitCamelCaseCachedRegex);
Test(SplitCamelCaseCompiledRegex);
}
static void Test(Func<string,string> function)
{
Console.Write("{0}... ", function.Method.Name);
Stopwatch sw = Stopwatch.StartNew();
for (int i=0; i < Iterations; i++)
{
string result = function(TestData);
if (result.Length != ValidResult.Length)
{
throw new Exception("Bad result: " + result);
}
}
sw.Stop();
Console.WriteLine(" {0}ms", sw.ElapsedMilliseconds);
GC.Collect();
}
private static string BenchmarkOverhead(string str)
{
return ValidResult;
}
private static string MakeNiceString(string str)
{
char[] ca = str.ToCharArray();
string result = null;
int i = 0;
result += System.Convert.ToString(ca[0]);
for (i = 1; i <= ca.Length - 1; i++)
{
if (!(char.IsLower(ca[i])))
{
result += " ";
}
result += System.Convert.ToString(ca[i]);
}
return result;
}
private static string ImprovedMakeNiceString(string str)
{ //Removed Convert.ToString()
char[] ca = str.ToCharArray();
string result = null;
int i = 0;
result += ca[0];
for (i = 1; i <= ca.Length - 1; i++)
{
if (!(char.IsLower(ca[i])))
{
result += " ";
}
result += ca[i];
}
return result;
}
private static string RefactoredMakeNiceString(string str)
{
char[] ca = str.ToCharArray();
StringBuilder sb = new StringBuilder((str.Length * 5 / 4));
int i = 0;
sb.Append(ca[0]);
for (i = 1; i <= ca.Length - 1; i++)
{
if (!(char.IsLower(ca[i])))
{
sb.Append(" ");
}
sb.Append(ca[i]);
}
return sb.ToString();
}
private static string MakeNiceStringWithStringIndexer(string str)
{
StringBuilder sb = new StringBuilder((str.Length * 5 / 4));
sb.Append(str[0]);
for (int i = 1; i < str.Length; i++)
{
char c = str[i];
if (!(char.IsLower(c)))
{
sb.Append(" ");
}
sb.Append(c);
}
return sb.ToString();
}
private static string MakeNiceStringWithForeach(string str)
{
StringBuilder sb = new StringBuilder(str.Length * 5 / 4);
bool first = true;
foreach (char c in str)
{
if (!first && char.IsUpper(c))
{
sb.Append(" ");
}
sb.Append(c);
first = false;
}
return sb.ToString();
}
private static string MakeNiceStringWithForeachAndLinqSkip(string str)
{
StringBuilder sb = new StringBuilder(str.Length * 5 / 4);
sb.Append(str[0]);
foreach (char c in str.Skip(1))
{
if (char.IsUpper(c))
{
sb.Append(" ");
}
sb.Append(c);
}
return sb.ToString();
}
private static string MakeNiceStringWithForeachAndCustomSkip(string str)
{
StringBuilder sb = new StringBuilder(str.Length * 5 / 4);
sb.Append(str[0]);
foreach (char c in new SkipEnumerable<char>(str, 1))
{
if (char.IsUpper(c))
{
sb.Append(" ");
}
sb.Append(c);
}
return sb.ToString();
}
private static string SplitCamelCase(string str)
{
string[] temp = Regex.Split(str, @"(?<!^)(?=[A-Z])");
string result = String.Join(" ", temp);
return result;
}
private static readonly Regex CachedRegex = new Regex("(?<!^)(?=[A-Z])");
private static string SplitCamelCaseCachedRegex(string str)
{
string[] temp = CachedRegex.Split(str);
string result = String.Join(" ", temp);
return result;
}
private static readonly Regex CompiledRegex =
new Regex("(?<!^)(?=[A-Z])", RegexOptions.Compiled);
private static string SplitCamelCaseCompiledRegex(string str)
{
string[] temp = CompiledRegex.Split(str);
string result = String.Join(" ", temp);
return result;
}
private class SkipEnumerable<T> : IEnumerable<T>
{
private readonly IEnumerable<T> original;
private readonly int skip;
public SkipEnumerable(IEnumerable<T> original, int skip)
{
this.original = original;
this.skip = skip;
}
public IEnumerator<T> GetEnumerator()
{
IEnumerator<T> ret = original.GetEnumerator();
for (int i=0; i < skip; i++)
{
ret.MoveNext();
}
return ret;
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
}
现在结果:
BenchmarkOverhead... 22ms
MakeNiceString... 10062ms
ImprovedMakeNiceString... 12367ms
RefactoredMakeNiceString... 3489ms
MakeNiceStringWithStringIndexer... 3115ms
MakeNiceStringWithForeach... 3292ms
MakeNiceStringWithForeachAndLinqSkip... 5702ms
MakeNiceStringWithForeachAndCustomSkip... 4490ms
SplitCamelCase... 68267ms
SplitCamelCaseCachedRegex... 52529ms
SplitCamelCaseCompiledRegex... 26806ms
如您所见,字符串索引器版本是赢家 - 它的代码也非常简单。
希望这对您有所帮助...别忘了,肯定还有其他我没有想到的选择!
关于c# - C# 中的字符串基准 - 重构速度/可维护性,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/473087/