如果之前已经有人问过这个问题,我深表歉意,但我尝试过的所有解决方案似乎都不起作用。
我创建了一个程序,用户可以在其中输入单词,该程序会从 Dictionary.com 网站中提取该单词的示例。
我想删除始终围绕关键字的 HTML 标记。我该如何去做呢?
import requests
word = input("Enter a word: ")
webContent = requests.get('https://www.dictionary.com/browse/'+word)
from bs4 import BeautifulSoup
soup = BeautifulSoup(webContent.text, 'html.parser')
results = soup.find_all('p', attrs={'class':'one-click-content css-it69we e15kc6du7'})
firstResult = results[0]
print(firstResult.contents[0:3])
结果:
最佳答案
import requests
import re
word = input("Enter a word: ")
webContent = requests.get('https://www.dictionary.com/browse/'+word)
from bs4 import BeautifulSoup
soup = BeautifulSoup(webContent.text, 'html.parser')
results = soup.find_all('p', attrs={'class':'one-click-content css-it69we e15kc6du7'})
firstResult = results[0]
firstResult.contents=[re.sub('<[^<]+?>', '', str(x)) for x in firstResult.contents]
print(firstResult.contents[0:3])
结果:
关于python - 如何从输出文本中删除 HTML 标签?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53887905/