python - Beautiful Soup 输出中的正则表达式

标签 python regex python-3.x beautifulsoup

我正在尝试从 HTML 页面获取行,由 BS 处理,包含
“十亿”这个词。但是我得到的是空列表.....顺便说一句,这些行介于
<li>标签,我尝试使用 soup.findAll("<li>", {"class": "tabcontent"})

但它也给了我一个空列表。

import requests
from bs4 import BeautifulSoup
import re

url = 'http://www.worldstopexports.com/united-states-top-10-exports/'

header = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
    "X-Requested-With": "XMLHttpRequest"
}

page = requests.get (url, headers=header)

soup = BeautifulSoup (page.text, 'lxml')

table = soup.find_all (class_='tabcontent')[0].text

print(re.findall(r'^.*? billion', table))

print(table)



Machinery including computers: US$201.7 billion (13% of total exports)
Electrical machinery, equipment: $174.2 billion (11.3%)
Mineral fuels including oil: $138 billion (8.9%)
Aircraft, spacecraft: $131.2 billion (8.5%)
Vehicles: $130.1 billion (8.4%)
Optical, technical, medical apparatus: $83.6 billion (5.4%)
Plastics, plastic articles: $61.5 billion (4%)
Gems, precious metals: $60.4 billion (3.9%)
Pharmaceuticals: $45.1 billion (2.9%)
Organic chemicals: $36.2 billion (2.3%)

最佳答案

您可以使用 select() 首先获取选项卡,然后获取 li 子项和文本:

# ... right under soup = BeautifulSoup (page.text, 'lxml') ...
# select the first tab
tab = soup.select('div.tabcontent')[0]

# select its items
items = [text 
    for item in tab.select('li') 
    for text in [item.text] 
    if "billion" in text]
print(items)

这产生

['Machinery including computers: US$201.7 billion (13% of total exports)', 'Electrical machinery, equipment: $174.2 billion (11.3%)', 'Mineral fuels including oil: $138 billion (8.9%)', 'Aircraft, spacecraft: $131.2 billion (8.5%)', 'Vehicles: $130.1 billion (8.4%)', 'Optical, technical, medical apparatus: $83.6 billion (5.4%)', 'Plastics, plastic articles: $61.5 billion (4%)', 'Gems, precious metals: $60.4 billion (3.9%)', 'Pharmaceuticals: $45.1 billion (2.9%)', 'Organic chemicals: $36.2 billion (2.3%)']

关于python - Beautiful Soup 输出中的正则表达式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49371947/

相关文章:

python - 如何在 Windows 中用 python 处理文件上传?

Ruby - 用符号替换字符串中的字符

regex - Validations.pattern 不适用于正则表达式

python-3.x - AWS S3 根据条件路径检查文件是否存在

python - 如何找到 Sprite 和屏幕角之间的距离(以像素为单位)?

python - 在 Python 中构建安装脚本时出错

python - Pandas MemoryError while pd.concat

python - 如何对二叉搜索树中给定值下的所有节点求和?

python - 如何重新安装损坏的 pip?

Python 正则表达式 : To capture all words within nested parentheses