我使用了以下 tool构建一个有效的 regex用于提及和主题标签。我已经设法在插入的文本中匹配到我想要的内容,但我需要解决以下匹配问题。
Only match those substrings which start and end with spaces. And in the case of a substring at the beginning or at the end of the string that is valid (be it a hashtag or a mention), also take it.
The matches found by the regex only take the part that does not contain spaces, (that the spaces are only part of the rule, but not part of the substring).
我使用的正则表达式如下:(([@]{1}|[#]{1})[A-Za-z0-9]+)
字符串匹配的有效性和无效性的一些例子:
"@hello friend" - @hello must be matched as a mention.
"@ hello friend" - here there should be no matches.
"hey@hello @hello" - here only the last @hello must be matched as a mention.
"@hello! hi @hello #hi ##hello" - here only the second @hello and #hi must be matched as a mention and hashtag respectively.
图像中的另一个示例,其中只有 "@word"
应该是有效的提及:
2018 年 3 月 15 日 16:35 (GMT-4) 更新
我找到了解决问题的方法,使用 tool在 PCRE 模式下(服务器)并使用 negative lookbehind
和 negative lookahead
:
(?<![^\s])(([@]{1}|[#]{1})[A-Za-z0-9]+)(?![^\s])
这是比赛:
但现在疑问来了,它与C#
中的正则表达式一起工作吗?negative lookahead
和negative lookbehind
,因为例如在 Javascript 中它不会工作,正如在工具中看到的那样,它用红线标记我。
最佳答案
试试这个模式:
(?:^|\s+)(?:(?<mention>@)|(?<hash>#))(?<item>\w+)(?=\s+)
这里分解一下:
-
(?:
创建一个非捕获组 -
^|\s+
匹配字符串或空格的开头 -
(?:
创建一个非捕获组 -
(?<mention>@|(?<hash>#)
创建一个组来匹配@
或#
并分别命名组mention和hash -
(?<item>\w+)
与任何字母数字字符匹配一次或多次,并帮助从组中提取项目以便于使用。 -
(?=\s+)
创建一个积极的前景来匹配任何空白
fiddle :Live Demo
然后您需要使用底层语言来修剪返回的匹配项以删除任何前导/尾随空格。
更新 既然你提到你在使用 C#,我想我会为你提供一个 .NET 解决方案来解决你的问题,而不需要 RegEx;虽然我没有测试结果,但我猜这也比使用 RegEx 更快。
就个人而言,我的 .NET 风格是 Visual Basic,所以我为您提供了一个 VB.NET 解决方案,但您可以通过转换器轻松地运行它,因为我从不使用任何不能在C#:
Private Function FindTags(ByVal lead As Char, ByVal source As String) As String()
Dim matches As List(Of String) = New List(Of String)
Dim current_index As Integer = 0
'Loop through all but the last character in the source
For index As Integer = 0 To source.Length - 2
'Reset the current index
current_index = index
'Check if the current character is a "@" or "#" and either we're starting at the beginning of the String or the last character was whitespace and then if the next character is a letter, digit, or end of the String
If source(index) = lead AndAlso (index = 0 OrElse Char.IsWhiteSpace(source, index - 1)) AndAlso (Char.IsLetterOrDigit(source, index + 1) OrElse index + 1 = source.Length - 1) Then
'Loop until the next character is no longer a letter or digit
Do
current_index += 1
Loop While current_index + 1 < source.Length AndAlso Char.IsLetterOrDigit(source, current_index + 1)
'Check if we're at the end of the line or the next character is whitespace
If current_index = source.Length - 1 OrElse Char.IsWhiteSpace(source, current_index + 1) Then
'Add the match to the collection
matches.Add(source.Substring(index, current_index + 1 - index))
End If
End If
Next
Return matches.ToArray()
End Function
fiddle :Live Demo
关于c# - 如何为提及和主题标签修复此正则表达式?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49308174/