python - 无法使用 BeautifulSoup 获取数据

我试图用 Beautiful Soup 编写简单的脚本，它可以只删除两个信息并从网站生成一个 SQL 文件。

import mechanize
import urlparse
from bs4 import BeautifulSoup

op = mechanize.Browser()
op.open("https://www.mentalhelp.net/symptoms/")
for link in op.links():
print link.text
print urlparse.urljoin(link.base_url, link.url)
get = BeautifulSoup(urllib2.urlopen("https://www.mentalhelp.net/symptoms/").read()).findAll('p')
print get
print "\n"

错误:

C:\Python27>python symtoms.py File "symtoms.py", line 8 print link.text ^ IndentationError: expected an indented block

我只想要一个脚本，它将废弃这些项目和简短描述并生成一个只有两个字段“名称”和“sug”的 SQL 文件。 “name”是那些项目，“sug”是那些描述。

最佳答案

缩进在 Python 中很重要，它用于确定块，如 for 循环或 if 块或 while 循环或函数等。

在您提供的代码中，for 循环之后的语句没有在 for 循环内正确缩进，并且 for 循环希望其主体中至少有一个语句，我认为您希望 for 循环下面的行在 for 循环内，所以你应该在 for 循环中缩进它们。

代码 -

for link in op.links():
    print link.text
    print urlparse.urljoin(link.base_url, link.url)
    get = BeautifulSoup(urllib2.urlopen("https://www.mentalhelp.net/symptoms/").read()).findAll('p')
    print get
    print "\n"

虽然我不确定这是否会得到您想要的结果，但它会修复您当前的错误。

对于仅获得 classic symptoms 的新要求及其描述，您可以使用 -

soup = BeautifulSoup(urllib2.urlopen("https://www.mentalhelp.net/symptoms/").read())
for div in soup.findAll('div',{'id':'page'}):
    for entrydiv in div.findAll('div',{'class':'h4 entry-title'}):
        print(entrydiv.get_text())
        print(entrydiv.next_sibling.get_text())

关于python - 无法使用 BeautifulSoup 获取数据，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/32142869/

python - 无法使用 BeautifulSoup 获取数据

上一篇：ruby - 如何使用 Mechanize 获取元素节点

下一篇：javascript - 在没有启用javascript的网页上使用 Mechanize 和美汤