python - 两个字符串之间的正则表达式文本

标签 python regex

我正在尝试使用正则表达式从 PDF 文本中提取数据字段。

正文为:

"SAMPLE EXPERIAN CUSTOMER\n2288150 - EXPERIAN SAMPLE REPORTS\nData Dictionary Report\nFiltered By:\nCustom Selection\nMarketing Element:\nPage 1 of 284\n2014-11-11 21:52:01 PM\nExperian and the marks used herein are service marks or registered trademarks of Experian.\n© Experian 2014 All rights reserved. Confidential and proprietary.\n**Data Dictionary**\nDate of Birth is acquired from public and proprietary files. These sources provide, at a minimum, the year of birth; the month is provided where available. Exact date of birth at various levels of detail is available for \n\n\n\n\n\nNOTE: Records coded with DOB are exclusive of Estimated Age (101E)\n**Element Number**\n0100\nDescription\nDate Of Birth / Exact Age\n**Data Dictionary**\n\n\n\n\n\n\n\n\n\n\nFiller, three bytes\n**Element Number**\n0000\n**Description**\nEnhancement Mandatory Append\n**Data Dictionary**\n\n\nWhen there is insufficient data to match a customer's record to our enrichment master for estimated age, a median estimated age based on the ages of all other adult individuals in the same ZIP+4 area is provided. \n\n\n\n\n\n\n00 = Unknown\n**Element Number**\n0101E\n**Description**\nEstimated Age\n"

字段名称以粗体显示。字段名称之间的文本是字段值。

我第一次尝试使用以下正则表达式提取“描述”字段:

pattern = re.compile('\nDescription\n(.*?)\nData Dictionary\n')
re.findall(pattern,text)

结果正确:

['Date Of Birth / Exact Age', 'Enhancement Mandatory Append']

但是使用相同的想法来提取“数据字典”字段会给出空结果:

pattern = re.compile('\nData Dictionary\n(.*?)\nElement Number\n')
re.findall(pattern,text)

结果:

[]

知道为什么吗?

最佳答案

. 默认情况下不匹配换行符。尝试:

pattern = re.compile('\nData Dictionary\n(.*?)\nElement Number\n', flags=re.DOTALL)
re.findall(pattern,text)

注意我是如何通过 re.DOTALL作为 re.compileflags 参数.

关于python - 两个字符串之间的正则表达式文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31885340/

相关文章:

regex - Perl6 : Confused about BagHash/Matching

Python 3 删除文件字符串的一部分

python - 如何计算整数的各个数字之和并在末尾打印原始整数?

python - 按计数聚合,将所有列保留在 Pandas 中

regex - 模式属性值不是有效的正则表达式

python - 使用 '\' 在正则表达式中匹配特殊字符 '\\' 时出现问题

python - 安全解压 str.split 的结果

python - 正则表达式仅匹配两个数字

python - 如何在单元测试中验证python日志格式?

JavaScript 正则表达式