python - 如何在 Python 中使用正则表达式从同一个字符串中提取多个值？

标签 python regex web-scraping beautifulsoup

我目前正在尝试从网页中抓取一些数据。我需要的数据在 <meta> 内html 源代码的标记。使用 BeautifulSoup 抓取数据并将其保存为字符串是没有问题的。

该字符串包含 2 个我要提取的数字。这些数字(1-100 的评分)中的每一个都应分配给一个不同的变量以供进一步处理。

test_str = "<meta content=\"Overall Rating: 79/100 ... Some Info ... Score: 86/100 \"/>"

第一个值为79/100第二个是86/100 , 但我只需要 79和 86 .到目前为止，我已经创建了一个正则表达式搜索来查找这些值，然后是 .replace("/100")。清理一切。

但是在我的代码中，我只获得了第一个正则表达式搜索匹配项的值，即 79 .我尝试使用 m.group(1) 获取第二个值但它不起作用。

我错过了什么？

test_str = "<meta content=\"Overall Rating: 79/100 ... Some Info ... Score: 86/100 \"/>"

m = re.search("../100", test_str)
if m:
    found = m.group(0).replace("/100","")
    print found

    # output -> 79

感谢您的帮助。

最好的问候!

最佳答案

test_str = "<meta content=\"Overall Rating: 79/100 ... Some Info ... Score: 86/100 \"/>"    
m =  re.findall('(\d+(?=\/100))', test_str)
# m = ['79', '86']

我将 .. 更改为 /d+ 因此您可以搜索 1 位或 2 位

我还使用了积极的前瞻性 (?=...)，所以 .replace 变得不必要了

例子在 Regex101

关于python - 如何在 Python 中使用正则表达式从同一个字符串中提取多个值？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44095791/

上一篇：python - 使用字典作为链接更改对象

下一篇：python - 查找字符串列表中的所有子字符串并创建一个新的匹配子字符串列表。在Python中

相关文章：

正则表达式捕获的内容超出预期

css - 如何在 Scrapy css 中删除\r\n、空格和启用重音符号？

python - Pandas read_html 值错误 : No tables found

python - python中离散功率谱密度的正确归一化实际问题

python - sqlite3 -- 无法打开数据库文件

python - 如何从文本文件中的特定行获取计时并用Python中的另一行减去它？

python - 扩展描述一组数字的字符串，这些数字被标记为数字和/或范围列表

python - 从 lastfm : TypeError: string indices must be integers 获取轨道的 mbid

regex - 使用 "rex"的 Splunk 查询失败，错误在 'SearchParser' : Missing a search command before '^' on REST API

python - Scrapy 抓取并跟踪 href 中的链接