如何找到 [¹²³⁴⁵⁶⁷⁸⁹⁰
] 之前的单词。例如:
let myString = "Regular expressions¹ consist of constants, ² and operator symbols...³"
请提供一个模式来选择从目标词开头到上标的字符:
"expressions¹", "constants, ²", "symbols...³"
& pattern 只选择目标词
"expressions", "constants", "symbols"
最佳答案
这将匹配您的示例。
代码点:
\b\w+\W*[\x{B9}\x{B2}\x{B3}\x{2074}\x{2075}\x{2076}\x{2077}\x{2078}\x{2079}\x{2070}]+
来自维基百科:
The most common superscript digits (1, 2, and 3) were in ISO-8859-1 and were therefore carried over into those positions in the Latin-1 range of Unicode. The rest were placed in a dedicated section of Unicode at U+2070 to U+209F.
更新:
要获得以单词或非单词开头的单独 block ,您可以
从非词类中排除上标范围。
正则表达式更长且更冗余,但它有效。
(?:\b\w+[^\w\x{B9}\x{B2}\x{B3}\x{2074}\x{2075}\x{2076}\x{ 2077}\x{2078}\x{2079}\x{2070}]*|[^\w\x{B9}\x{B2}\x{B3}\x{2074}\x{2075}\x{2076}\x{2077}\x{2078}\x{2079}\x{2070}]+)[\x{B9}\x{B2}\x{B3}\x{2074}\x {2075}\x{2076}\x{2077}\x{2078}\x{2079}\x{2070}]+
格式化
(?:
\b
# Required - Words
\w+
# Optional - Not words, nor supersctipt
[^\w\x{B9}\x{B2}\x{B3}\x{2074}\x{2075}\x{2076}\x{2077}\x{2078}\x{2079}\x{2070}]*
| # or,
# Required - Not words, nor supersctipt
[^\w\x{B9}\x{B2}\x{B3}\x{2074}\x{2075}\x{2076}\x{2077}\x{2078}\x{2079}\x{2070}]+
)
# Required - Superscript
[\x{B9}\x{B2}\x{B3}\x{2074}\x{2075}\x{2076}\x{2077}\x{2078}\x{2079}\x{2070}]+
关于regex - 查找符号集前面的单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33875363/