python - 如何从页面标题标签中删除换行符和换行符？ (谷歌应用程序引擎 - Python)

我有这个代码来提取标题:

soup = BeautifulSoup.BeautifulSoup(urllib.urlopen(url))
title = str(soup.html.head.title.string).lstrip("\r\n").rstrip("\r\n")

一些网站在标题标签之前和之后添加回车符或换行符(为什么？)并删除它们，我添加了

.lstrip("\r\n").rstrip("\r\n")

这适用于例如 http://www.readwriteweb.com/但不包括 http://poundwire.com/ 。你能说出为什么一个有效而另一个无效吗？

更新

跟进 Steve Jessop 的评论；我正在使用 replace 并且它似乎有效:

title = str(soup.html.head.title.string).replace("\t", "").replace("\r", "").replace("\n", "")

如果有更好的方法请告诉我。谢谢。

更新2

我找到了这个answer看起来更好:

title = " ".join(str(soup.html.head.title.string).split())

最佳答案

尝试使用 str(title).strip() 它将修剪字符串开头和结尾的所有空格。

关于python - 如何从页面标题标签中删除换行符和换行符？ (谷歌应用程序引擎 - Python)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/5426523/

相关文章：

python - 使用旧 python 版本的 mod_wsgi 运行时