lua - Lua 中的贪婪/非贪婪模式匹配和可选后缀

标签 lua

在 Lua 中,我试图进行模式匹配和捕获:

+384 Critical Strike (Reforged from Parry Chance)

作为
(+384) (Critical Strike)

其中后缀 (Reforged from %s) 是可选的。

长版

我正在尝试匹配 Lua using patterns (i.e. strfind ) 中的字符串

Note: In Lua they don't call them regular expressions, they call them patterns because they're not regular.



示例字符串:
+384 Critical Strike
+1128 Hit

这分为我想要捕捉的两部分:

enter image description here
  • 数字,带前导正负指示符;他的情况是 +384
  • 字符串,在本例中是 Critical Strike

  • 我可以使用一个相当简单的模式来捕捉这些:

    enter image description here

    lua 中的这种模式有效:
    local text = "+384 Critical Strike";
    local pattern = "([%+%-]%d+) (.+)";
    local _, _, value, stat = strfind(text, pattern);
    
  • 值 = +384
  • 统计 = Critical Strike

  • 棘手的部分

    Now 我需要扩展正则表达式模式以包含一个可选的后缀:
    +384 Critical Strike (Reforged from Parry Chance)
    

    分为:

    enter image description here

    注意: 我不是特别关心可选的尾随后缀;这意味着我不需要捕获它,尽管捕获它会很方便。

    这就是我开始遇到贪婪捕获问题的地方。马上我已经拥有的模式做了我不希望它做的事情:
  • 模式 = ([%+%-]%d+) (.+)
  • 值 = +384
  • 统计 = Critical Strike (Reforged from Parry Chance)

  • 但是让我们尝试在模式中包含后缀:

    enter image description here

    与模式:
    pattern = "([%+%-]%d+) (.+)( %(Reforged from .+%))?"
    

    我正在使用 ? 运算符来指示后缀的 01 外观,但匹配 没有

    我盲目地尝试将可选的后缀组从括号 ( 更改为括号 [ :
    pattern = "([%+%-]%d+) (.+)[ %(Reforged from .+%)]?"
    

    但现在比赛又贪婪了:
  • 值 = +384
  • 统计 = Critical Strike (Reforged from Parry Chance)

  • 基于 Lua pattern reference ):

    • x: (where x is not one of the magic characters ^$()%.[]*+-?) represents the character x itself.
    • .: (a dot) represents all characters.
    • %a: represents all letters.
    • %c: represents all control characters.
    • %d: represents all digits.
    • %l: represents all lowercase letters.
    • %p: represents all punctuation characters.
    • %s: represents all space characters.
    • %u: represents all uppercase letters.
    • %w: represents all alphanumeric characters.
    • %x: represents all hexadecimal digits.
    • %z: represents the character with representation 0.
    • %x: (where x is any non-alphanumeric character) represents the character x. This is the standard way to escape the magic characters. Any punctuation character (even the non-magic) can be preceded by a '%' when used to represent itself in a pattern.
    • [set]: represents the class which is the union of all characters in set. A range of characters can be specified by separating the end characters of the range with a '-'. All classes %x described above can also be used as components in set. All other characters in set represent themselves. For example, [%w_] (or [_%w]) represents all alphanumeric characters plus the underscore, [0-7] represents the octal digits, and [0-7%l%-] represents the octal digits plus the lowercase letters plus the '-' character. The interaction between ranges and classes is not defined. Therefore, patterns like [%a-z] or [a-%%] have no meaning.
    • [^set]: represents the complement of set, where set is interpreted as above.

    For all classes represented by single letters (%a, %c, etc.), the corresponding uppercase letter represents the complement of the class. For instance, %S represents all non-space characters.

    The definitions of letter, space, and other character groups depend on the current locale. In particular, the class [a-z] may not be equivalent to %l.



    和魔法匹配器:
  • * ,匹配类中的 0 个或多个重复字符。这些重复项将始终匹配最长的可能序列;
  • + ,匹配类中的 1 个或多个重复字符。这些重复项将始终匹配最长的可能序列;
  • - ,它也匹配类中的 0 个或多个重复字符。与 '*' 不同,这些重复项将始终匹配最短的序列;
  • ? ,匹配类中字符出现0次或1次;

  • 我注意到有一个贪婪的 * 和一个非贪婪的 - 修饰符。由于我的中间字符串匹配器:
    (%d) (%s) (%s)
    

    似乎一直在吸收文本直到最后,也许我应该尝试通过将 * 更改为 - 来使其不贪婪:
    oldPattern = "([%+%-]%d+) (.*)[ %(Reforged from .+%)]?"
    newPattern = "([%+%-]%d+) (.-)[ %(Reforged from .+%)]?"
    

    除了现在它无法匹配:
  • 值 = +384
  • 统计 = 无

  • 而不是中间组捕获“任何”字符(即 . ),我尝试了一个包含除 ( 之外的所有内容的集合:
    pattern = "([%+%-]%d+) ([^%(]*)( %(Reforged from .+%))?"
    

    从那里车轮从马车上脱落:
    local pattern = "([%+%-]%d+) ([^%(]*)( %(Reforged from .+%))?"
    local pattern = "([%+%-]%d+) ((^%()*)( %(Reforged from .+%))?"
    local pattern = "([%+%-]%d+) (%a )+)[ %(Reforged from .+%)]?"
    

    我以为我很接近:
    local pattern = "([%+%-]%d+) ([%a ]+)[ %(Reforged from .+%)]?"
    

    哪个捕获
    - value = "+385"
    - stat = "Critical Strike "  (notice the trailing space)
    

    所以这就是我用头撞枕头 sleep 的地方;我简直不敢相信我在这个正则表达式上花了四个小时......模式。

    @NicolBolas 使用伪正则表达式语言定义的所有可能字符串的集合是:
    +%d %s (Reforged from %s)
    

    在哪里
  • + 表示 Plus Sign ( + )"Minus Sign" ( - )
  • %d 代表任何拉丁数字字符(例如 0..9 )
  • %s 代表任何拉丁大写或小写字母,或嵌入的空格(例如 A-Za-z )
  • 其余字符是文字​​。

  • 如果我必须写一个正则表达式,显然试图做我想做的事:
    \+\-\d+ [\w\s]+( \(Reforged from [\w\s]+\))?
    

    但是如果我解释得不够好,我可以给你几乎完整的列表,列出我可能在野外遇到的所有值。
  • +123 Parry 正数,单字
  • +123 Critical Strike 正数,两个字
  • -123 Parry 负数,单字
  • -123 Critical Strike 负数,两个字
  • +123 Parry (Reforged from Dodge) 正数,单字,可选后缀,单字
  • +123 Critical Strike (Reforged from Dodge) 正数,两个字,可选后缀存在两个字
  • -123 Parry (Reforged from Hit Chance) 负数,单字,可选后缀存在两个字
  • -123 Critical Strike (Reforged from Hit Chance) 负数,两个字,可选后缀存在两个字

  • 奖励 个模式,显然这些模式也匹配:
  • +1234 Critical Strike Chance 四位数字,三个字
  • +12345 Mount and run speed increase 五位数字,五个字
  • +123456 Mount and run speed increase 六位数字,五个字
  • -1 MoUnT aNd RuN sPeEd InCrEaSe 一位数,五个字
  • -1 HiT (Reforged from CrItIcAl StRiKe ChAnCe) 负一位数,1 个字,可选后缀为 3 个字

  • 虽然理想的模式应该与上述奖励条目相匹配,但并非必须如此。

    本土化

    实际上,我试图解析的所有“数字”都将被本地化,例如:
  • +123,456 英语(en-US)
  • 德国的 +123.456 (de-DE)
  • +123'456 法语 (fr-CA)
  • +123 456 爱沙尼亚语 (et-EE)
  • 阿萨姆语的 +1,23,456 (as-IN)

  • 任何答案都必须 而不是 试图解释这些本地化问题。您不知道将显示数字的语言环境,这就是为什么从问题中删除了数字本地化的原因。您 必须 严格假定数字包含 plus signhyphen minus 和拉丁数字 09 。我已经知道如何解析本地化数字。这个问题是关于尝试将可选后缀与贪婪模式解析器匹配。

    编辑 :您真的不必尝试处理本地化数字。在某种程度上,在不知道语言环境的情况下尝试处理它们是错误的。例如,我没有包括所有可能的数字本地化。另一个:我不知道 future 可能存在哪些本地化。

    最佳答案

    嗯我没有安装 Lua4 但这个模式在 Lua5 下有效。我希望它也适用于 Lua4。

    更新 1 :由于已经指定了附加要求(本地化),我已经调整了模式和测试以反射(reflect)这些要求。

    更新 2 :更新了模式和测试以处理包含@IanBoyd 在评论中提到的数字的附加文本类。添加了说明
    的字符串模式。

    更新 3 :为问题的上次更新中提到的单独处理本地化数字的情况添加了变化。

    尝试:

    "(([%+%-][',%.%d%s]-[%d]+)%s*([%a]+[^%(^%)]+[%a]+)%s*(%(?[%a%s]*%)?))"
    

    或(不尝试验证数字本地化标记) - 只需取任何不是在模式末尾带有数字标记的字母:
    "(([%+%-][^%a]-[%d]+)%s*([%a]+[^%(^%)]+[%a]+)%s*(%(?[%a%s]*%)?))"
    

    以上两种模式都不是为了处理科学记数法中的数字(例如:1.23e+10)

    Lua5 测试(编辑清理 - 测试变得困惑):
    function test(tab, pattern)
       for i,v in ipairs(tab) do
         local f1, f2, f3, f4 = v:match(pattern)
         print(string.format("Test{%d} - Whole:{%s}\nFirst:{%s}\nSecond:{%s}\nThird:{%s}\n",i, f1, f2, f3, f4))
       end
     end
    
     local pattern = "(([%+%-][',%.%d%s]-[%d]+)%s*([%a]+[^%(^%)]+[%a]+)%s*(%(?[%a%s]*%)?))"
     local testing = {"+123 Parry",
       "+123 Critical Strike",
       "-123 Parry",
       "-123 Critical Strike",
       "+123 Parry (Reforged from Dodge)",
       "+123 Critical Strike (Reforged from Dodge)",
       "-123 Parry (Reforged from Hit Chance)",
       "-123 Critical Strike (Reforged from Hit Chance)",
       "+122384    Critical    Strike      (Reforged from parry chance)",
       "+384 Critical Strike ",
       "+384Critical Strike (Reforged from parry chance)",
       "+1234 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)",
       "+12345 Mount and run speed increase (Reforged from CrItIcAl StRiKe ChAnCe)",
       "+123456 Mount and run speed increase (Reforged from CrItIcAl StRiKe ChAnCe)",
       "-1 MoUnT aNd RuN sPeEd InCrEaSe (Reforged from CrItIcAl StRiKe ChAnCe)",
       "-1 HiT (Reforged from CrItIcAl StRiKe ChAnCe)",
       "+123,456 +1234 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)",
       "+123.456 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)",
       "+123'456 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)",
       "+123 456 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)",
       "+1,23,456 Critical Strike Chance (Reforged from CrItIcAl StRiKe ChAnCe)",
       "+9 mana every 5 sec",
       "-9 mana every 20 min (Does not occurr in data but gets captured if there)"}
     test(testing, pattern)
    

    这是模式的分割:
    local explainPattern =  
       "(" -- start whole string capture
       ..
       --[[
       capture localized number with sign - 
       take at first as few digits and separators as you can 
       ensuring the capture ends with at least 1 digit
       (the last digit is our sentinel enforcing the boundary)]]
       "([%+%-][',%.%d%s]-[%d]+)" 
       ..
       --[[
       gobble as much space as you can]]
       "%s*"
       ..
       --[[
       capture start with letters, followed by anything which is not a bracket 
       ending with at least 1 letter]]
       "([%a]+[^%(^%)]+[%a]+)"
       ..
       --[[
       gobble as much space as you can]]
       "%s*"
       ..
       --[[
       capture an optional bracket
       followed by 0 or more letters and spaces
       ending with an optional bracket]]
       "(%(?[%a%s]*%)?)"
       .. 
       ")" -- end whole string capture
    

    关于lua - Lua 中的贪婪/非贪婪模式匹配和可选后缀,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13619193/

    相关文章:

    在win32(xp、vista等)中将Lua绑定(bind)到Ada?

    algorithm - 线性回归因子

    lua - Lua 代码中存在缺陷的游戏逻辑

    lua内存管理

    Lua - 在 32 位 lua 编译器上处理 64 位数字

    mysql - 如何从 tarantool 连接到 mysql?

    file - 在lua中创建一个临时文件

    lua - 在分隔符上分割字符串

    events - 5秒内按任意键

    lua - 追踪递归求值器