python - 使用 bs4 进行 HTML 解析

我正在解析一个 HTMl 页面，我很难弄清楚如何在没有类或 ID 的情况下提取某个“p”标签。我试图用纬度和经度来达到“p”的标签。这是我当前的代码:

 import bs4
 from urllib import urlopen as uReq #this opens the URL
 from bs4 import BeautifulSoup as soup #parses/cuts  the html

 my_url = 'http://www.fortwiki.com/Battery_Adair'
 print(my_url)
 uClient = uReq(my_url) #opens the HTML and stores it in uClients

 page_html = uClient.read() # reads the URL
 uClient.close() # closes the URL

 page_soup = soup(page_html, "html.parser") #parses/cuts the HTML
 containers = page_soup.find_all("table")
 for container in containers:
    title = container.tr.p.b.text.strip()
    history = container.tr.p.text.strip()
      lat_long = container.tr.table
       print(title)
       print(history)
       print(lat_long)

网站链接:http://www.fortwiki.com/Battery_Adair

最佳答案

<p>你要找的标签在文档中很常见，它没有任何独特的属性，所以我们不能直接选择它。

一个可能的解决方案是按索引选择标签，如 bloopiebloopie 的 answer .
但是，除非您知道标签的确切位置，否则这是行不通的。

另一种可能的解决方案是找到具有不同属性/文本的相邻标签，然后选择与之相关的标签。
在这种情况下，我们可以找到前面带有文本的标签:“ map 和图像”，并使用 find_next 选择下一个标签。

import requests
from bs4 import BeautifulSoup

url = 'http://www.fortwiki.com/Battery_Adair'
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")

b = soup.find('b', text='Maps & Images')
if b:
    lat_long = b.find_next().text

此方法应在任何具有 map 的 www.fortwiki.com 页面中找到坐标数据。

关于python - 使用 bs4 进行 HTML 解析，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49618479/

python - 使用 bs4 进行 HTML 解析

上一篇：调用函数时python 'module'对象不可调用

下一篇：Python检查列表中的两个连续单词是否是另一个列表中的单词