python - 使用正则表达式从文本文件中提取数字

我正在尝试制作一个 python 脚本来读取文本文件 input.txt，然后扫描该文件中的所有电话号码并将所有匹配的电话号码写回到 output.txt

假设文本文件是这样的:

Hey my number is 1234567890 and another number is +91-1234567890. but if none of these is available you can call me on +91 5645454545 (or) mail me at abc@xyz.com

它应该匹配 1234567890、+91-1234567890 和 +91 5645454545

import re

no = '^(\+[1-9]\d{0,2}[- ]?)?[1-9][0-9]{9}' #i think problem is here
f2 = open('output.txt','w+')

for line in open('input.txt'):
    out = re.findall(no,line)
    for i in out : 
        f2.write(i + '\n')

no 的正则表达式类似于:它需要最多 3 位数字的国家代码，然后是可选的 - 或空格，国家代码本身是可选的，然后是 10 位数字。

最佳答案

是的，问题出在您的正则表达式上。幸运的是，它很小。您只需要删除 ^ 字符:

'(\+[1-9]\d{0,2}[- ]?)?[1-9]\d{9}'

^ 表示您只想匹配字符串的开头。您希望在整个字符串中多次匹配。这是一个 101demo 。

对于 python，您还需要使用 ?: 指定一个非捕获组。否则，re.findall 不会返回完整的匹配:

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups.

大胆强调我的。这是一个 relevant question 。

这是为问题指定非捕获组时得到的结果:

In [485]: re.findall('(?:\+[1-9]\d{0,2}[- ]?)?[1-9]\d{9}', text)
Out[485]: ['1234567890', '+91-1234567890', '+91 5645454545']

关于python - 使用正则表达式从文本文件中提取数字，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45604701/

python - 使用正则表达式从文本文件中提取数字

上一篇：python - view.py 中的问题 : message error Invalid literal for int() with base 10

下一篇：python - 如何使用 python(不是按键)检测按键释放？