显然,Trim 的主要用途是从字符串中删除开头和结尾的空格,例如:
" hello ".Trim(); // results in "hello"
但 Trim 也会删除额外的字符,如 \n
、\r
和 \t
,因此:
" \nhello\r\t ".Trim(); // it also produces "hello"
是否有 Trim
将删除的所有字符的明确列表(最好是字符串转义格式,如 \n
)?
编辑:感谢您的详细回答 - 我现在知道确切的字符。这Wikipedia list that @RayKoopa left in comments对我来说可能是最好看的格式。
最佳答案
我们可以看一下String
类的源代码here
公共(public) Trim()
方法调用名为 TrimHelper()
的内部辅助方法:
public String Trim() {
Contract.Ensures(Contract.Result<String>() != null);
Contract.EndContractBlock();
return TrimHelper(TrimBoth);
}
TrimHelper()
看起来像这样:
[System.Security.SecuritySafeCritical] // auto-generated
private String TrimHelper(int trimType) {
//end will point to the first non-trimmed character on the right
//start will point to the first non-trimmed character on the Left
int end = this.Length-1;
int start=0;
//Trim specified characters.
if (trimType !=TrimTail) {
for (start=0; start < this.Length; start++) {
if (!Char.IsWhiteSpace(this[start]) && !IsBOMWhitespace(this[start])) break;
}
}
if (trimType !=TrimHead) {
for (end= Length -1; end >= start; end--) {
if (!Char.IsWhiteSpace(this[end]) && !IsBOMWhitespace(this[start])) break;
}
}
return CreateTrimmedString(start, end);
}
所以你的大部分问题基本上在于检查 Char.IsWhiteSpace
方法,
[Pure]
public static bool IsWhiteSpace(char c) {
if (IsLatin1(c)) {
return (IsWhiteSpaceLatin1(c));
}
return CharUnicodeInfo.IsWhiteSpace(c);
}
如果它是一个拉丁字符,那么这就是构成空白的原因:
private static bool IsWhiteSpaceLatin1(char c) {
// There are characters which belong to UnicodeCategory.Control but are considered as white spaces.
// We use code point comparisons for these characters here as a temporary fix.
// U+0009 = <control> HORIZONTAL TAB
// U+000a = <control> LINE FEED
// U+000b = <control> VERTICAL TAB
// U+000c = <contorl> FORM FEED
// U+000d = <control> CARRIAGE RETURN
// U+0085 = <control> NEXT LINE
// U+00a0 = NO-BREAK SPACE
if ((c == ' ') || (c >= '\x0009' && c <= '\x000d') || c == '\x00a0' || c == '\x0085') {
return (true);
}
return (false);
}
否则我们必须去CharUnicodeInfo.cs
,它使用枚举来检查空白字符
internal static bool IsWhiteSpace(char c)
{
UnicodeCategory uc = GetUnicodeCategory(c);
// In Unicode 3.0, U+2028 is the only character which is under the category "LineSeparator".
// And U+2029 is th eonly character which is under the category "ParagraphSeparator".
switch (uc) {
case (UnicodeCategory.SpaceSeparator):
case (UnicodeCategory.LineSeparator):
case (UnicodeCategory.ParagraphSeparator):
return (true);
}
return (false);
}
关于c# - String.Trim() 删除的所有字符的列表?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37333250/