python - 如何查找和存储空class和id的信息?

标签 python web-scraping beautifulsoup python-requests

在 python3 中,我想从 site 中提取信息并放入变量

例如,在我想存储的“Dados do processo” block 中:

"Indenização por Dano Moral"
"Direito de Imagem"
"Violeta Miera Arriba"
"R$ 38.160,00"

隔离 block :

from bs4 import BeautifulSoup
import requests

link = 'https://esaj.tjsp.jus.br/cpopg/show.do?processo.codigo=01001DTQA0000&processo.foro=1&uuidCaptcha=sajcaptcha_380320b510ee415ca0ca56cfac794999'

try:
    res = requests.get(link, verify=False) # avoid SSLError
except (requests.exceptions.HTTPError, requests.exceptions.RequestException, requests.exceptions.ConnectionError, requests.exceptions.Timeout) as e:
    print(str(e))
except Exception as e:
    print("Exceção")

soup =  BeautifulSoup(res.text, "lxml")

janela1 = soup.find_all("table",{"class":"secaoFormBody"})[1]

dados_processo = janela1.find_all("tr",{"class":""})

例如,信息“Indenização por Dano Moral”位于dados_processo

<tr class="">
<td id="" valign="" width="150">
<label class="labelClass" for="" style="text-align:right;font-weight:bold;;">Assunto:</label>
</td>
<td valign="">
<span class="" id="">Indenização por Dano Moral</span>
</td>
</tr>

请问,有人知道如何到达“span class="” id=“”吗?我不明白,因为它以这种方式在 block 的几个点上重复,并用“”表示类和“”对于 ID

我想过在 "label class="labelClass"for="""中查找字符串 "Assunto:",如果找到,则采用 "span class=""id=“” 中的字符串 此检查很有用,因为一些类似的网站可能不包含所有项目

最佳答案

您可以使用 :contains 来定位“标题”,然后 adjacent sibling (+) combinator包含感兴趣值的td。这是使用 bs4 4.7.1

from bs4 import BeautifulSoup as bs
import requests
import urllib3; urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

r = requests.get('https://esaj.tjsp.jus.br/cpopg/show.do?processo.codigo=01001DTQA0000&processo.foro=1&uuidCaptcha=sajcaptcha_380320b510ee415ca0ca56cfac794999', verify=False)
soup = bs(r.content, 'lxml')
print(soup.select_one('td:has(>.labelClass:contains("Assunto:")) + td').text.strip())
print(soup.select_one('td:has(>.labelClass:contains("Outros assuntos:")) + td').text.strip())
print(soup.select_one('td:has(>.labelClass:contains("Juiz:")) + td').text.strip())
print(soup.select_one('td:has(>.labelClass:contains("Valor da ação:")) + td').text.strip())
<小时/>

您可以使用 if 来测试是否存在,以防万一:

soup.select_one('td:has(>.labelClass:contains("Assunto:")) + td').text.strip() if soup.select_one('td:has(>.labelClass:contains("Assunto:")) + td') is not None else 'N/A'

关于python - 如何查找和存储空class和id的信息?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55817274/

相关文章:

python插入与追加

python - Django 根据 ForeignKey 集中的属性过滤对象

Python:如何添加两个列表,其中与该键的值相同的键没有重复值?

python - 管道弹出标准错误和标准输出

Python 请求 - "To continue your browser has to accept cookies and has to have JavaScript enabled."

pandas - 使用 selenium 或 beautifulsoup 从站点上抓取表

python - 使用 BeautifulSoup 迭代 XML 以提取特定标签并存储在变量中

python - 如何用Scrapy递归爬取子页面

python - 尝试写入 CSV,但某些字段在 python 的 scrapy 中被排除

python - 为什么我从网络抓取中得到空列表?