java - Java 11 中 String trim() 和 strip() 方法的区别

标签 java string trim strip java-11

在其他变化中,JDK 11 为 java.lang.String 类引入了 6 个新方法:

  • repeat(int) - 按照 int 参数提供的次数重复字符串
  • lines() - 使用 Spliterator 懒惰地提供源字符串中的行
  • isBlank() - 指示字符串是否为空或仅包含空格字符
  • stripLeading() - 删除开头的空白
  • stripTrailing() - 删除末尾的空白
  • strip() - 删除字符串开头和结尾的空格

特别是,strip() 看起来与 trim() 非常相似。根据 this article strip*() 方法旨在:

The String.strip(), String.stripLeading(), and String.stripTrailing() methods trim white space [as determined by Character.isWhiteSpace()] off either the front, back, or both front and back of the targeted String.

String.trim() JavaDoc 状态:

/**
  * Returns a string whose value is this string, with any leading and trailing
  * whitespace removed.
  * ...
  */

这与上面的引用几乎相同。

String.trim()String.strip() 自 Java 11 以来到底有什么区别?

最佳答案

简而言之:strip()trim() 的“Unicode 感知”演变。含义 trim() 仅删除字符 <= U+0020(空格); strip() 删除所有 Unicode 空白字符(但不是所有控制字符,例如\0)

CSR : JDK-8200378

Problem

String::trim 从 Java 早期的 Unicode 开始就存在

had not fully evolved to the standard we widely use today.

The definition of space used by String::trim is any code point less than or equal to the space code point (\u0020), commonly referred to as ASCII or ISO control characters.

Unicode-aware trimming routines should use Character::isWhitespace(int).

Additionally, developers have not been able to specifically remove indentation white space or to specifically remove trailing white space.

Solution

Introduce trimming methods that are Unicode white space aware and provide additional control of leading only or trailing only.

这些新方法的一个共同特点是,它们使用的“空白”定义与旧方法(例如 String.trim())不同(更新)。错误 JDK-8200373 .

The current JavaDoc for String::trim does not make it clear which definition of "space" is being used in the code. With additional trimming methods coming in the near future that use a different definition of space, clarification is imperative. String::trim uses the definition of space as any codepoint that is less than or equal to the space character codepoint (\u0020.) Newer trimming methods will use the definition of (white) space as any codepoint that returns true when passed to the Character::isWhitespace predicate.

isWhitespace(char) 方法在 JDK 1.1 中被添加到 Character 中,但是方法 isWhitespace(int) 没有被引入Character 类直到 JDK 1.5。添加了后一种方法(接受 int 类型参数的方法)以支持补充字符。 Character 类的 Javadoc 注释定义了补充字符(通常使用基于 int 的“代码点”建模)与 BMP 字符(通常使用单个字符建模):

The set of characters from U+0000 to U+FFFF is sometimes referred to as the Basic Multilingual Plane (BMP). Characters whose code points are greater than U+FFFF are called supplementary characters. The Java platform uses the UTF-16 representation in char arrays and in the String and StringBuffer classes. In this representation, supplementary characters are represented as a pair of char values ... A char value, therefore, represents Basic Multilingual Plane (BMP) code points, including the surrogate code points, or code units of the UTF-16 encoding. An int value represents all Unicode code points, including supplementary code points. ... The methods that only accept a char value cannot support supplementary characters. ... The methods that accept an int value support all Unicode characters, including supplementary characters.

OpenJDK Changeset .


trim()strip() 的基准比较 - Why is String.strip() 5 times faster than String.trim() for blank string In Java 11

关于java - Java 11 中 String trim() 和 strip() 方法的区别,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51266582/

相关文章:

java - 当 main 方法包含在类中时,我的程序可以编译,但当 main 方法是单独的时则不能编译

java - Emoji 不显示在推送通知中,而是显示问号或显示 unicode

c - 如何判断输入的字符是否为数字?

java - 如何修剪字符串两次并获得不同的信息?

javascript - 如何从字符串中 trim ()空格?

java - 在 gson 中反序列化混合 json 字符串有什么想法吗?

java - Java中的日历对象返回错误的时间?

python - 如何在 Python 中使用正则表达式删除右方括号?

java - 如何打印我的 Java 对象而不得到 "SomeType@2f92e0f4"?

从整个句子中删除多余的空格