python - Python 中的正则表达式和格式化

我有一个输入数据集如下 -

INPUT = [
'ABCD , D.O.B: - Jun/14/1999.',
'EFGH , DOB; - Jan/10/1998,',
'IJKL , D-O-B - Jul/15/1985..',
'MNOP , (DOB)* - Dec/21/1999,',
'QRST , *DOB* - Apr/01/2000.',
'UVWX , D O B, - Feb/11/2001 '
]

我希望这是以下格式的输出形式 -

OUTPUT = [
('ABCD, Jun/14/1999'),
('EFGH, Jan/10/1998'),
('IJKL, Jul/15/1985'),
('MNOP, Dec/21/1999'),
('QRST, Apr/1/2000'),
('UVWX, Feb/11/2001')
]

我尝试了以下部分有效的代码，但我无法以所需的输出格式进行格式化 -

import re

INPUT = [
'ABCD , D.O.B: - Jun/14/1999.',
'EFGH , DOB; - Jan/10/1998,',
'IJKL , D-O-B - Jul/15/1985..',
'MNOP , (DOB)* - Dec/21/1999,',
'QRST , *DOB* - Apr/01/2000.',
'UVWX , D O B, - Feb/11/2001 '
]


def formatted_def(input):
    for n in input:
        t = re.sub('[^a-zA-Z0-9 ]+','',n).split('DOB')
        print(t)


formatted_def(INPUT)

输出-

['ABCD  ', '  Jun141999']
['EFGH  ', '  Jan101998']
['IJKL  ', '  Jul151985']
['MNOP  ', '  Dec211999']
['QRST  ', '  Apr012000']
['UVWX  D O B  Feb112001 ']

任何指针都会非常有帮助。提前致谢!

最佳答案

import re
re.findall(r'(\w+)\s+,.*?-\s+([^., ]*)', ' '.join(INPUT))
# [('ABCD', 'Jun/14/1999'), ('EFGH', 'Jan/10/1998'), ('IJKL', 'Jul/15/1985'), ('MNOP', 'Dec/21/1999'), ('QRST', 'Apr/01/2000'), ('UVWX', 'Feb/11/2001')]

关于python - Python 中的正则表达式和格式化，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51030488/

上一篇：python - 使用 numpy 的批量张量乘法

下一篇：python - 时间数据 '2018-06-19 11:21:13.311' 与格式不匹配

python - 将日期时间的时间部分转换为秒

python - 从 ORM 中提取数据并按日期分组

python - 谷歌 Dataproc Presto : how to run queries using Python

python - 如何在 python 中使用 urlparse 和 split() 解析 URL？

regex - 整个字符串与/.*/之后的第二个匹配是什么？

python - 我想知道为什么循环不会在这里停止

python - 允许直接在类主体中使用语句有什么意义？

objective-c - 使用单词验证正则表达式进行输入字符验证

c# - 如何将 Regex.Matches 放入数组中？