python - 使用 python 正则表达式模块将值替换为先前出现的首字母缩略词

我需要将前一个单词添加到出现在句子的 -number 之前的 -number。请检查输入字符串和预期输出字符串以获得更多说明。我已经用静态方式尝试了正则表达式的 .replace、.sub 方法，这是一种操纵输出。

输入字符串:

The acnes stimulated the mRNA expression of interleukin (IL)-1, -8, LL-37, MMP-1, -2, -3, -9, and -13 in keratinocytes.

预期输出字符串:

The acnes stimulated the mRNA expression of interleukin (IL)-1, interleukin (IL)-8, LL-37, MMP-1, MMP-2, MMP-3, MMP-9, and MMP-13 in keratinocytes.

代码:

import re
string_a = "The acnes stimulated the mRNA expression of interleukin (IL)-1, -8, LL-37, MMP-1, -2, -3, -9, and -13 in keratinocytes."
regex1 = re.findall(r"[a-z]+\s+\(+[A-Z]+\)+-\d+\,\s+-\d\,+", string_a)
regex2 = re.findall(r"[A-Z]+-\d+\,\s+-\d\,\s+-\d\,\s+-\d\,\s+[a-z]+\s+-\d+", string_a)

最佳答案

你可以使用

import re
string_a = "The acnes stimulated the mRNA expression of interleukin (IL)-1, -8, LL-37, MMP-1, -2, -3, -9, and -13 in keratinocytes."
pattern = re.compile(r"\b([A-Za-z]+\s*\([A-Z]+\)|[A-Z]+)(\s*-\d+(?:,\s*-\d+)*)(?:,\s*and\s+(-\d+))?")
print( pattern.sub(lambda x: x.group(1) + f', {x.group(1)}'.join(map(str.strip, x.group(2).strip().split(','))) + (f', and {x.group(1)}{x.group(3)}' if x.group(3) else ''), string_a) )
# => The acnes stimulated the mRNA expression of interleukin (IL)-1, interleukin (IL)-8, LL-37, MMP-1, MMP-2, MMP-3, MMP-9, and MMP-13 in keratinocytes.

参见 Python demo和一个 regex demo .

详情

\b - 单词边界
([A-Za-z]+\s*\([A-Z]+\)|[A-Z]+) - 捕获第 1 组:一个或多个 ASCII 字母，然后是零或更多空格，(，一个或多个大写 ASCII 字母，和一个 )，OR 一个或多个大写 ASCII 字母
(\s*-\d+(?:,\s*-\d+)*) - 捕获第 2 组:零个或多个空格，-，一个或多个数字，然后是零个或多个逗号序列、零个或多个空格、- 和一个或多个数字
(?:,\s*and\s+(-\d+))? - 可选的非捕获组:逗号、零个或多个空格、和，一个或多个空格，然后是捕获组 3:-，一个或多个数字。

第 1 组值被添加到用作替换参数的 lambda 中所有第 2 组逗号分隔的数字。

如果第 3 组匹配，和+空格+串联的第 1 组和第 3 组值将被附加。

关于python - 使用 python 正则表达式模块将值替换为先前出现的首字母缩略词，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/66487506/

python - 使用 python 正则表达式模块将值替换为先前出现的首字母缩略词

上一篇：javascript - 检测用户何时使用 CSS 调整 Div 大小 : both

下一篇：excel - 如何去掉 CSV 文件中字符串末尾不需要的逗号