在 Lua 中,我试图进行模式匹配和捕获:
+384 Critical Strike (Reforged from Parry Chance)
作为
(+384) (Critical Strike)
其中后缀
(Reforged from %s)
是可选的。长版
我正在尝试匹配 Lua using patterns (i.e.
strfind
) 中的字符串Note: In Lua they don't call them regular expressions, they call them patterns because they're not regular.
示例字符串:
+384 Critical Strike
+1128 Hit
这分为我想要捕捉的两部分:
+384
Critical Strike
。 我可以使用一个相当简单的模式来捕捉这些:
lua 中的这种模式有效:
local text = "+384 Critical Strike";
local pattern = "([%+%-]%d+) (.+)";
local _, _, value, stat = strfind(text, pattern);
+384
Critical Strike
棘手的部分
Now 我需要扩展正则表达式模式以包含一个可选的后缀:
+384 Critical Strike (Reforged from Parry Chance)
分为:
注意: 我不是特别关心可选的尾随后缀;这意味着我不需要捕获它,尽管捕获它会很方便。
这就是我开始遇到贪婪捕获问题的地方。马上我已经拥有的模式做了我不希望它做的事情:
([%+%-]%d+) (.+)
+384
Critical Strike (Reforged from Parry Chance)
但是让我们尝试在模式中包含后缀:
与模式:
pattern = "([%+%-]%d+) (.+)( %(Reforged from .+%))?"
我正在使用
?
运算符来指示后缀的 0
或 1
外观,但匹配 没有 。我盲目地尝试将可选的后缀组从括号
(
更改为括号 [
:pattern = "([%+%-]%d+) (.+)[ %(Reforged from .+%)]?"
但现在比赛又贪婪了:
+384
Critical Strike (Reforged from Parry Chance)
基于 Lua pattern reference ):
- x: (where x is not one of the magic characters ^$()%.[]*+-?) represents the character x itself.
- .: (a dot) represents all characters.
- %a: represents all letters.
- %c: represents all control characters.
- %d: represents all digits.
- %l: represents all lowercase letters.
- %p: represents all punctuation characters.
- %s: represents all space characters.
- %u: represents all uppercase letters.
- %w: represents all alphanumeric characters.
- %x: represents all hexadecimal digits.
- %z: represents the character with representation 0.
- %x: (where x is any non-alphanumeric character) represents the character x. This is the standard way to escape the magic characters. Any punctuation character (even the non-magic) can be preceded by a '%' when used to represent itself in a pattern.
- [set]: represents the class which is the union of all characters in set. A range of characters can be specified by separating the end characters of the range with a '-'. All classes %x described above can also be used as components in set. All other characters in set represent themselves. For example, [%w_] (or [_%w]) represents all alphanumeric characters plus the underscore, [0-7] represents the octal digits, and [0-7%l%-] represents the octal digits plus the lowercase letters plus the '-' character. The interaction between ranges and classes is not defined. Therefore, patterns like [%a-z] or [a-%%] have no meaning.
- [^set]: represents the complement of set, where set is interpreted as above.
For all classes represented by single letters (%a, %c, etc.), the corresponding uppercase letter represents the complement of the class. For instance, %S represents all non-space characters.
The definitions of letter, space, and other character groups depend on the current locale. In particular, the class [a-z] may not be equivalent to %l.
和魔法匹配器:
*
,匹配类中的 0 个或多个重复字符。这些重复项将始终匹配最长的可能序列; +
,匹配类中的 1 个或多个重复字符。这些重复项将始终匹配最长的可能序列; -
,它也匹配类中的 0 个或多个重复字符。与 '*' 不同,这些重复项将始终匹配最短的序列; ?
,匹配类中字符出现0次或1次; 我注意到有一个贪婪的
*
和一个非贪婪的 -
修饰符。由于我的中间字符串匹配器:(%d) (%s) (%s)
似乎一直在吸收文本直到最后,也许我应该尝试通过将
*
更改为 -
来使其不贪婪:oldPattern = "([%+%-]%d+) (.*)[ %(Reforged from .+%)]?"
newPattern = "([%+%-]%d+) (.-)[ %(Reforged from .+%)]?"
除了现在它无法匹配:
+384
而不是中间组捕获“任何”字符(即
.
),我尝试了一个包含除 (
之外的所有内容的集合:pattern = "([%+%-]%d+) ([^%(]*)( %(Reforged from .+%))?"
从那里车轮从马车上脱落:
local pattern = "([%+%-]%d+) ([^%(]*)( %(Reforged from .+%))?"
local pattern = "([%+%-]%d+) ((^%()*)( %(Reforged from .+%))?"
local pattern = "([%+%-]%d+) (%a )+)[ %(Reforged from .+%)]?"
我以为我很接近:
local pattern = "([%+%-]%d+) ([%a ]+)[ %(Reforged from .+%)]?"
哪个捕获
- value = "+385"
- stat = "Critical Strike " (notice the trailing space)
所以这就是我用头撞枕头 sleep 的地方;我简直不敢相信我在这个正则表达式上花了四个小时......模式。
@NicolBolas 使用伪正则表达式语言定义的所有可能字符串的集合是:
+%d %s (Reforged from %s)
在哪里
+
表示 Plus Sign ( +
) 或 "Minus Sign" ( -
) %d
代表任何拉丁数字字符(例如 0..9
)%s
代表任何拉丁大写或小写字母,或嵌入的空格(例如 A-Za-z
)如果我必须写一个正则表达式,显然试图做我想做的事:
\+\-\d+ [\w\s]+( \(Reforged from [\w\s]+\))?
但是如果我解释得不够好,我可以给你几乎完整的列表,列出我可能在野外遇到的所有值。
+123 Parry
正数,单字 +123 Critical Strike
正数,两个字 -123 Parry
负数,单字 -123 Critical Strike
负数,两个字 +123 Parry (Reforged from Dodge)
正数,单字,可选后缀,单字 +123 Critical Strike (Reforged from Dodge)
正数,两个字,可选后缀存在两个字 -123 Parry (Reforged from Hit Chance)
负数,单字,可选后缀存在两个字 -123 Critical Strike (Reforged from Hit Chance)
负数,两个字,可选后缀存在两个字 有 奖励 个模式,显然这些模式也匹配:
+1234 Critical Strike Chance
四位数字,三个字 +12345 Mount and run speed increase
五位数字,五个字 +123456 Mount and run speed increase
六位数字,五个字 -1 MoUnT aNd RuN sPeEd InCrEaSe
一位数,五个字 -1 HiT (Reforged from CrItIcAl StRiKe ChAnCe)
负一位数,1 个字,可选后缀为 3 个字 虽然理想的模式应该与上述奖励条目相匹配,但并非必须如此。
本土化
实际上,我试图解析的所有“数字”都将被本地化,例如:
+123,456
英语(en-US)+123.456
(de-DE) +123'456
法语 (fr-CA) +123 456
爱沙尼亚语 (et-EE) +1,23,456
(as-IN) 任何答案都必须 而不是 试图解释这些本地化问题。您不知道将显示数字的语言环境,这就是为什么从问题中删除了数字本地化的原因。您 必须 严格假定数字包含
plus sign
、 hyphen minus
和拉丁数字 0
到 9
。我已经知道如何解析本地化数字。这个问题是关于尝试将可选后缀与贪婪模式解析器匹配。编辑 :您真的不必尝试处理本地化数字。在某种程度上,在不知道语言环境的情况下尝试处理它们是错误的。例如,我没有包括所有可能的数字本地化。另一个:我不知道 future 可能存在哪些本地化。
最佳答案
嗯我没有安装 Lua4 但这个模式在 Lua5 下有效。我希望它也适用于 Lua4。
更新 1 :由于已经指定了附加要求(本地化),我已经调整了模式和测试以反射(reflect)这些要求。
更新 2 :更新了模式和测试以处理包含@IanBoyd 在评论中提到的数字的附加文本类。添加了说明
的字符串模式。
更新 3 :为问题的上次更新中提到的单独处理本地化数字的情况添加了变化。
尝试:
"(([%+%-][',%.%d%s]-[%d]+)%s*([%a]+[^%(^%)]+[%a]+)%s*(%(?[%a%s]*%)?))"
或(不尝试验证数字本地化标记) - 只需取任何不是在模式末尾带有数字标记的字母:
"(([%+%-][^%a]-[%d]+)%s*([%a]+[^%(^%)]+[%a]+)%s*(%(?[%a%s]*%)?))"
以上两种模式都不是为了处理科学记数法中的数字(例如:1.23e+10)
Lua5 测试(编辑清理 - 测试变得困惑):
function test(tab, pattern)
for i,v in ipairs(tab) do
local f1, f2, f3, f4 = v:match(pattern)
print(string.format("Test{%d} - Whole:{%s}\nFirst:{%s}\nSecond:{%s}\nThird:{%s}\n",i, f1, f2, f3, f4))
end
end
local pattern = "(([%+%-][',%.%d%s]-[%d]+)%s*([%a]+[^%(^%)]+[%a]+)%s*(%(?[%a%s]*%)?))"
local testing = {"+123 Parry",
"+123 Critical Strike",
"-123 Parry",
"-123 Critical Strike",
"+123 Parry (Reforged from Dodge)",
"+123 Critical Strike (Reforged from Dodge)",
"-123 Parry (Reforged from Hit Chance)",
"-123 Critical Strike (Reforged from Hit Chance)",
"+122384 Critical Strike (Reforged from parry chance)",
"+384 Critical Strike ",
"+384Critical Strike (Reforged from parry chance)",
"+1234 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)",
"+12345 Mount and run speed increase (Reforged from CrItIcAl StRiKe ChAnCe)",
"+123456 Mount and run speed increase (Reforged from CrItIcAl StRiKe ChAnCe)",
"-1 MoUnT aNd RuN sPeEd InCrEaSe (Reforged from CrItIcAl StRiKe ChAnCe)",
"-1 HiT (Reforged from CrItIcAl StRiKe ChAnCe)",
"+123,456 +1234 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)",
"+123.456 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)",
"+123'456 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)",
"+123 456 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)",
"+1,23,456 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)",
"+9 mana every 5 sec",
"-9 mana every 20 min (Does not occurr in data but gets captured if there)"}
test(testing, pattern)
这是模式的分割:
local explainPattern =
"(" -- start whole string capture
..
--[[
capture localized number with sign -
take at first as few digits and separators as you can
ensuring the capture ends with at least 1 digit
(the last digit is our sentinel enforcing the boundary)]]
"([%+%-][',%.%d%s]-[%d]+)"
..
--[[
gobble as much space as you can]]
"%s*"
..
--[[
capture start with letters, followed by anything which is not a bracket
ending with at least 1 letter]]
"([%a]+[^%(^%)]+[%a]+)"
..
--[[
gobble as much space as you can]]
"%s*"
..
--[[
capture an optional bracket
followed by 0 or more letters and spaces
ending with an optional bracket]]
"(%(?[%a%s]*%)?)"
..
")" -- end whole string capture
关于lua - Lua 中的贪婪/非贪婪模式匹配和可选后缀,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13619193/