python - 如何在python中的特定文本之前或之后找到最近的数字

标签 python regex

示例文本:

1. There are 500 employees in our organisation.

2. Abbott employed approximately 103,000 people as of December 31, 2018

3. We currently employ approximately 1,750 employees

4. As of December 31, 2018, we had approximately 25,300 full-time employees.

现在我想在“employe”这个词之前或之后找到最接近的数字。

c = re.search(r'(\w+\s+){0,3}employe(\w+\s+){0,3}', text, re.IGNORECASE)
print(c.group(0))

预期结果:

1. 500
2. 103,000
3. 1,750
4. 25,300

通过上面的代码,我试图找到最近的单词,然后找到其中的数字。

有没有更好的方法呢?

最佳答案

也许,一些类似的表达:

(?:\bemploye\D{0,20})([0-9][0-9,]*)[^.,]|([0-9][0-9,]*)[^.,](?:\D{0,20}employe)

也可能在某种程度上进行一些修改。

Demo

测试

import re

expression = r"(?i)(?:\bemploye\D{0,20})([0-9][0-9,]*)[^.,]|([0-9][0-9,]*)[^.,](?:\D{0,20}employe)"
string = """
1. There are 500 employees in our organisation.
2. Abbott employed approximately 103,000 people as of December 31, 2018
3. We currently employ approximately 1,750 employees
4. As of December 31, 2018, we had approximately 25,300 full-time employees.
5. As of December 31, 2018, we had approximately 30 full-time employees.
6. As of December 31, 2018, we had approximately 3 full-time employees.
"""
print(re.findall(expression, string))

输出

[('', '500'), ('103,000', ''), ('', '1,750'), ('', '25,300'), ('', '30'), ('', '3')]

如果您想简化/修改/探索表达式,在regex101.com 的右上面板中已对此进行了解释.如果你愿意,也可以在this link观看。 ,它将如何与一些样本输入相匹配。


正则表达式电路

jex.im可视化正则表达式:

enter image description here

关于python - 如何在python中的特定文本之前或之后找到最近的数字,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57945858/

相关文章:

regex - 我正在尝试使用包含 "awk"的 shell 脚本来针对文件运行

python - 正则表达式:如何匹配没有连续元音的单词?

python - 使用文本文档更改列表中的项目

python - 如果传递了类实例,如何执行函数?

python - 如何返回false而不是断言错误?

java - 匹配正则表达式中的任何字符?

正则表达式匹配具有以散列开头并以空格结尾的特殊字符的单词

Python WSGI + Flask render_template - 500 内部服务器错误?

python - 可以从导入的模块访问 __name__ 属性吗

java - 在java中使用正则表达式删除匹配的字符串