python - 使用正则表达式提取代码(不规则的正则表达式键)

我正在使用来自标题电子邮件的字符串列表提取代码。看起来像:

text_list = ['Industry / Gemany / PN M564839', 'Industry / France / PN: 575-439', 'Telecom / Gemany / P/N 26-59-29', 'Mobile / France / P/N: 88864839']

到目前为止我尝试过的是:

def get_p_number(text):
    rx = re.compile(r'[p/n:]\s+((?:\w+(?:\s+|$)){1})',
                    re.I)
    res = []
    m = rx.findall(text)
    if len(m) > 0:
        m = [p_number.replace(' ', '').upper() for p_number in m]
        m = remove_duplicates(m)
        res.append(m)
    else:
        res.append('no P Number found')
    return res

我的问题是，我无法提取 ['PN', 'P/N', 'PN:', 'P/N:'] 之前的单词旁边的代码，特别是如果后面的代码以字母(即“M”)开头或者其之间有斜杠(即26-59-29)。

我想要的输出是:

res = ['M564839','575-439','26-59-29','888489']

最佳答案

在您的模式中，字符类 [p/n:]\s+ 将匹配列出的其中一个后跟 1 个以上空白字符。在示例数据中，将匹配正斜杠或冒号后跟空格。

下一部分 (?:\w+(?:\s+|$)) 将匹配 1+ 个单词字符，后跟字符串末尾或 1+ 个空白字符(不带空格)考虑中间的字符或连字符。

一种选择是将 PN 与可选的 : 和 / 相匹配，而不是使用字符类 [p/n:] 并具有您在捕获组中的值(value):

/ P/?N:? ([\w-]+)

Regex demo | Python demo

例如:

import re
text_list = ['Industry / Gemany / PN M564839', 'Industry / France / PN: 575-439', 'Telecom / Gemany / P/N 26-59-29', 'Mobile / France / P/N: 88864839']
regex = r"/ P/?N:? ([\w-]+)"
res = []
for text in text_list: 
    matches = re.search(regex, text)
    if matches:
        res.append(matches.group(1))

print(res)

结果

['M564839', '575-439', '26-59-29', '88864839']

关于python - 使用正则表达式提取代码(不规则的正则表达式键)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56831129/

python - 使用正则表达式提取代码(不规则的正则表达式键)

上一篇：python - 从python中的df列中删除特殊字符和字符串

下一篇：python - 想要比较两列中的字符串并在 python pandas 中的同一行中引入相等的值