我正在尝试从该网站获取元数据(这是代码)。
import requests
from bs4 import BeautifulSoup
source = requests.get('https://www.svpboston.com/').text
soup = BeautifulSoup(source, features="html.parser")
title = soup.find("meta", name="description")
image = soup.find("meta", name="og:image")
print(title["content"] if title else "No meta title given")
print(image["content"]if title else "No meta title given")
但是我收到此错误。
Traceback (most recent call last):
File "C:/Users/User/PycharmProjects/Work/Web Scraping/Selenium/sadsaddas.py", line 9, in <module>
title = soup.find("meta", name="description")
TypeError: find() got multiple values for argument 'name'
有什么想法吗?
最佳答案
来自 bs4 docs
:
You can't use a keyword argument to search for HTML’s
name
element, because Beautiful Soup uses the name argument to contain the name of the tag itself. Instead, you can give a value to ‘name’ in the attrs argument
要按特定属性抓取标签,我建议您将其放入字典中并将该字典传递给 .find()
作为attrs
争论。但是您也传递了错误的属性来获取标题和图像。你应该捕获 meta
标记为 property=<...>
而不是name=<...>
。以下是获得所需内容的最终代码:
import requests
import requests
from bs4 import BeautifulSoup
source = requests.get('https://www.svpboston.com/').text
soup = BeautifulSoup(source, features="html.parser")
title = soup.find("meta", attrs={'property': 'og:title'})
image = soup.find("meta", attrs={'property': 'og:image'})
print(title["content"] if title is not None else "No meta title given")
print(image["content"] if title is not None else "No meta title given")
关于python - 按名称、beautiful soup 和 python 获取元标记内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66533085/