python - 使用 Python2.6 抓取时抓取子字符串

嘿，有人可以帮忙解决以下问题吗？

我正在尝试抓取一个包含以下信息的网站。我只需要提取 </strong> 后面的数字。标签..

[<li><strong>ISBN-13:</strong> 9780375853401</li>, <li><strong>Pub. Date: </strong> 05/11/2010</li>]
[<li><strong>UPC:</strong> 490355000372</li>, <li><strong>Catalog No:</strong> 15024/25</li>, <li><strong>Label:</strong> CAMERATA</li>]

这是我使用 mechanize 和 BeautifulSoup 获取上述数据的一段代码。我被困在这里，因为它不允许我使用 find() 函数来获取列表

br_results = mechanize.urlopen(br_results)
html = br_results.read()
soup = BeautifulSoup(html)
local_links = soup.findAll("a", {"class" : "down-arrow csa"})
upc_code = soup.findAll("ul", {"class" : "bc-meta3"})
for upc in upc_code:
    upc_text = upc.contents.contents
    print upc_text

最佳答案

我想upc_code是您向我们展示的列表，以及 local_links一个和你的问题没有关系吧？鉴于您没有在代码中进一步提及它......？

所以我不确定是什么 upc_text鉴于 upc 将在您的循环体中是 ul Tag -- upc.contents将是 li 的列表标签(大概)，我不知道如何 upc.contents.contents可以工作 - 您看到该代码的结果是什么？我原以为会有异常(exception)!

无论如何，我编写循环的方式是这样的:

for upc in upc_code:
    listitems = upc.findAll('li')
    for anitem in listitems:
        print anitem.contents[1]

因为您似乎想要每个列表项的第二个子项(第一个是 strong 标签，第二个是您想要的可导航字符串。

如果它不是您想要的每个列表项的第二个子项，请说明；例如，您可以识别强项并获取其下一个兄弟，如果这更适合您——只需将嵌套循环的主体变成

print anitem.find('strong').nextSibling

关于python - 使用 Python2.6 抓取时抓取子字符串，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/2845689/

python - 使用 Python2.6 抓取时抓取子字符串

上一篇：perl - WWW::Mechanize::GZip 触发 DIE 信号......为什么？

下一篇：ruby-on-rails - Ruby Mechanize 和继承的问题

python - 使用 Python2.6 抓取时抓取子字符串

上一篇：perl - WWW::Mechanize::GZip 触发 __DIE__ 信号......为什么？

下一篇：ruby-on-rails - Ruby Mechanize 和继承的问题

上一篇：perl - WWW::Mechanize::GZip 触发 DIE 信号......为什么？