python - 删除python3中html标签之间的换行符

标签 python python-3.x beautifulsoup html-parsing removing-whitespace

我想修剪掉所有的空格和新行并将结果从

<title>

     Asian Case Research Journal (World Scientific)

</title>

到此

<title>Asian Case Research Journal (World Scientific)</title>

我的代码:

for link in url_list:
    try:
    r = requests.get(link)
    soup = BeautifulSoup(r.content,"html.parser")
    print(soup.title)
except:
    print("No Title Found ")
    continue

最佳答案

import bs4

html = '''<title>

     Asian Case Research Journal (World Scientific)

</title>'''
soup = bs4.BeautifulSoup(html, 'lxml')
title = soup.title
title.string = title.get_text(strip=True)
print(str(title))

输出:

<title>Asian Case Research Journal (World Scientific)</title>

在bs4中，tag是一个具有字符串属性的Object，可以使用.表示法来访问或修改它，并使用str(tag)将tag对象转换为python str对象

文档:modifying-string

关于python - 删除python3中html标签之间的换行符，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42288624/

上一篇：python - Django-递归解析

下一篇：python - 在基类 python/django 上调用 super

python-3.x - Pandas :如何根据另一个数据框的值对数据框上的列求和

python - 在 Tensorflow 中读取 TensorArray 总是返回零

python - 如何使用BeautifulSoup在网站上获取实时股票价格？

python - 使用 beautiful soup 有条件地获取类内容

python - firebase python admin create user 没有模块

python - SQLAlchemy 和 limit()

python - 为什么我的 Ubuntu 在启动 python 脚本后卡住？

Python 打印错误消息 io.UnsupportedOperation : not readable

python - 使用 BeautifulSoup find_all 从最后一个元素获取内容