我正在尝试从这个网址中抓取数据:https://www.apple.com/ca/shop/browse/home/specialdeals/mac/macbook_pro/13
我正在尝试检索显示“
”的行8GB 2133MHz LPDDR3 板载内存
或
16GB 2133MHz LPDDR3 板载内存
在 containers = soup.findAll('tr', {'class': 'product'})
中的每个容器中使用 BeautifulSoup。问题是它周围有换行符和多个换行符,这使我很难解析。我怎样才能找回这个?
最佳答案
查看源代码,最好的选择是将 BeautifulSoup
与正则表达式
结合起来:
import requests
from bs4 import BeautifulSoup
import re
url = "https://www.apple.com/ca/shop/browse/home/specialdeals/mac/macbook_pro/13"
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
for td in soup.select('td.specs'):
m = re.search('^(8|16).*?onboard memory.*?$', td.text, flags=re.M|re.I)
if not m:
continue
print(td.select_one('h3').text.strip())
print('Full text: {} | Memory: {}'.format(m[0].strip(), m[1]))
print('-' * 80)
此代码查找所有 8 或 16 GB 的产品并打印它们:
Refurbished 13.3-inch MacBook Pro 2.3GHz dual-core Intel Core i5 with Retina display - Space Grey
Full text: 8GB of 2133MHz LPDDR3 onboard memory | Memory: 8
--------------------------------------------------------------------------------
Refurbished 13.3-inch MacBook Pro 2.3GHz dual-core Intel Core i5 with Retina display - Silver
Full text: 8GB of 2133MHz LPDDR3 onboard memory | Memory: 8
--------------------------------------------------------------------------------
Refurbished 13.3-inch MacBook Pro 2.0GHz Dual-core Intel Core i5 with Retina Display — Space Grey
Full text: 8GB of 1866MHz LPDDR3 onboard memory | Memory: 8
--------------------------------------------------------------------------------
Refurbished 13.3-inch MacBook Pro 2.3GHz dual-core Intel Core i5 with Retina display - Silver
Full text: 8GB of 2133MHz LPDDR3 onboard memory | Memory: 8
--------------------------------------------------------------------------------
Refurbished 13.3-inch MacBook Pro 2.3GHz dual-core Intel Core i5 with Retina display - Space Grey
Full text: 8GB of 2133MHz LPDDR3 onboard memory | Memory: 8
--------------------------------------------------------------------------------
Refurbished 13.3-inch Macbook Pro 2.9GHz Dual-core Intel Core i5 with Retina Display - Space Grey
Full text: 8GB of 2133MHz LPDDR3 onboard memory | Memory: 8
--------------------------------------------------------------------------------
Refurbished 13.3-inch Macbook Pro 2.9GHz Dual-core Intel Core i5 with Retina Display - Silver
Full text: 8GB of 2133MHz LPDDR3 onboard memory | Memory: 8
--------------------------------------------------------------------------------
Refurbished 13.3-inch Macbook Pro 2.9GHz Dual-core Intel Core i5 with Retina Display - Silver
Full text: 8GB of 2133MHz LPDDR3 onboard memory | Memory: 8
--------------------------------------------------------------------------------
Refurbished 13.3-inch MacBook Pro 3.1GHz dual-core Intel Core i5 with Retina display - Silver
Full text: 8GB of 2133MHz LPDDR3 onboard memory | Memory: 8
--------------------------------------------------------------------------------
Refurbished 13.3-inch MacBook Pro 3.1GHz dual-core Intel Core i5 with Retina display - Space Grey
Full text: 8GB of 2133MHz LPDDR3 onboard memory | Memory: 8
--------------------------------------------------------------------------------
Refurbished 13.3-inch Macbook Pro 3.3GHz Dual-core Intel Core i7 with Retina Display - Space Grey
Full text: 16GB of 2133MHz LPDDR3 onboard memory | Memory: 16
--------------------------------------------------------------------------------
关于python - 带有换行符的网页抓取数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51712984/