python - Beautifulsoup特殊字符解析错误

标签 python beautifulsoup

我正在使用 Beautiful Soup 和 urllib2 从互联网上收集内容。 这是我正在使用的代码。

from bs4 import BeautifulSoup
import urllib2

html = urllib2.urlopen('http://plrplr.com/33717/mp3-player-guide/').read()
soup = BeautifulSoup(html, "lxml")
contents = soup.find('div', {'class': 'entry-content'})
print contents

但是我得到了这样的结果......

<div class="entry-content">
<p>MP3 player, also well known as digital audio player has become a staple of our gadget life. There are many brands of MP3 players on the market today. So, which MP3 player are the most suitable for you? That’s where this MP3 player guide comes in. <br/>
Basically, there are 3 types of MP3 player based on capacity: – <br/>
1. Hard drive MP3 player <br/>
– highest capacity <br/>
– largest in size <br/>
– heavy <br/>
– often labeled as an “Jukebox MP3 player� <br/>
– has moving parts <br/>
– example: Apple iPod video, Sony Network Walkman NW-HD5 <br/>

处理特殊字符时出现问题。

我如何获得像这样的确切源代码...

    <div class="entry-content">
        <p>MP3 player, also well known as digital audio player has become a staple of our gadget life. There are many brands of MP3 players on the market today. So, which MP3 player are the most suitable for you? That&#8217;s where this MP3 player guide comes in. </br><br />
Basically, there are 3 types of MP3 player based on capacity: &#8211; </br><br />
1. Hard drive MP3 player </br><br />
&#8211; highest capacity </br><br />
&#8211; largest in size </br><br />
&#8211; heavy </br><br />
&#8211; often labeled as an &#8220;Jukebox MP3 player&#8221; </br><br />
&#8211; has moving parts </br><br />
&#8211; example: Apple iPod video, Sony Network Walkman NW-HD5 </br><br />

我使用 Eclipse 和 pydev 在 Windows 8 计算机上运行此代码。

最佳答案

您可能正在寻找的是contents.prettify(formatter="html")来显示实体代码而不是非ascii字母?

我无法在我的机器上测试它,但这里是我使用的文档:http://www.crummy.com/software/BeautifulSoup/bs4/doc/#output-formatters

关于python - Beautifulsoup特殊字符解析错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29102531/

相关文章:

Python Beautifulsoup4 :

python - 通过标签自定义 BeautifulSoup 的 prettify

python - 请求库在 Python 2 和 Python 3 上崩溃

python - 如何将 HTML 类添加到 Django 表单的帮助文本?

python - Pandas:将组标题移动到新列

python - 如何允许用户选择文件?

Python 相当于 Javascript querySelector

python - 构建组织结构图

python - 如何找到二值图像的最低点?

python - BeautifulSoup 在使用 find_all 时显示 "' NoneType' 对象不可调用”