regex - 如何仅将有效的罗马数字与正则表达式匹配?

标签 regex roman-numerals

思考my other problem ,我决定我什至无法创建一个匹配罗马数字的正则表达式(更不用说生成它们的上下文无关语法了)

问题在于仅匹配有效的罗马数字。 例如,990 不是“XM”,而是“CMXC”

我为此制作正则表达式的问题是,为了允许或不允许某些字符,我需要回顾一下。 让我们以数千和数百为例。

我可以允许 M{0,2}C?M(允许 900、1000、1900、2000、2900 和 3000)。但是,如果匹配在 CM 上,我不能允许后面的字符是 C 或 D(因为我已经在 900 了)。

如何在正则表达式中表达这一点?
如果它根本无法用正则表达式表达,那么它可以用上下文无关语法表达吗?

最佳答案

您可以使用以下正则表达式:

^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$

分割一下,M{0,4} 指定千位部分,并基本上将其限制在 04000 之间。这是一个相对简单的:

   0: <empty>  matched by M{0}
1000: M        matched by M{1}
2000: MM       matched by M{2}
3000: MMM      matched by M{3}
4000: MMMM     matched by M{4}

当然,如果您想允许更大的数字,您可以使用类似 M* 的内容来允许任何数字(包括零)数千。

接下来是(CM|CD|D?C{0,3}),稍微复杂一些,这是针对数百部分的,涵盖了所有可能性:

  0: <empty>  matched by D?C{0} (with D not there)
100: C        matched by D?C{1} (with D not there)
200: CC       matched by D?C{2} (with D not there)
300: CCC      matched by D?C{3} (with D not there)
400: CD       matched by CD
500: D        matched by D?C{0} (with D there)
600: DC       matched by D?C{1} (with D there)
700: DCC      matched by D?C{2} (with D there)
800: DCCC     matched by D?C{3} (with D there)
900: CM       matched by CM

第三,(XC|XL|L?X{0,3}) 遵循与上一节相同的规则,但对于十位:

 0: <empty>  matched by L?X{0} (with L not there)
10: X        matched by L?X{1} (with L not there)
20: XX       matched by L?X{2} (with L not there)
30: XXX      matched by L?X{3} (with L not there)
40: XL       matched by XL
50: L        matched by L?X{0} (with L there)
60: LX       matched by L?X{1} (with L there)
70: LXX      matched by L?X{2} (with L there)
80: LXXX     matched by L?X{3} (with L there)
90: XC       matched by XC

最后,(IX|IV|V?I{0,3}) 是单位部分,处理 09 > 也类似于前两节(罗马数字,尽管看起来很奇怪,但一旦你弄清楚它们是什么,就会遵循一些逻辑规则):

0: <empty>  matched by V?I{0} (with V not there)
1: I        matched by V?I{1} (with V not there)
2: II       matched by V?I{2} (with V not there)
3: III      matched by V?I{3} (with V not there)
4: IV       matched by IV
5: V        matched by V?I{0} (with V there)
6: VI       matched by V?I{1} (with V there)
7: VII      matched by V?I{2} (with V there)
8: VIII     matched by V?I{3} (with V there)
9: IX       matched by IX
<小时/>

请记住,该正则表达式也将匹配空字符串。如果您不希望这样(并且您的正则表达式引擎足够现代),您可以使用正向向后查找和向前查找:

(?<=^)M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})(?=$)

(另一种选择是预先检查长度是否不为零)。

关于regex - 如何仅将有效的罗马数字与正则表达式匹配?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/267399/

相关文章:

regex - 量词 {0} 在某些情况下有意义吗?

regex - 无法匹配Alex语法中的单个字符

r - 如何在R中检查字符串是否包含罗马数字?

c# - 正则表达式匹配以单词、数字或罗马数字形式书写的数字

c++ - 在 C++ 中将罗马数字转换为标准数字

java - 想要从文件Java中单独获取插入语句

python - 在 Python 中匹配 "Chinese+Number"模式的正则表达式

python - re 模块 - r 符号是什么?

c++ - 将罗马数字转换为 Int - 得到错误的输出 - 为什么?

php - 用php将数字转换为罗马数字