python - 无法解析链接中使用的数字

我用 python 创建了一个脚本，用于从网页获取 Tax District 的值。在它的主页中有一个表格需要填写以生成结果，其中可以找到我正在寻找的信息。当我使用下面的脚本时，我得到了所需的结果，但问题是我必须使用不同的链接来解析结果。我在脚本中使用的链接仅在填写表单后才可用。新生成的链接(我在脚本中使用过)有一些数字，我不知道如何找到它。

Main link

在搜索表单中，有一个单选按钮街道地址，默认情况下处于选中状态。然后:-

house number: 5587 (just above Exact/Low)
street name: Surrey

This is the link https://wedge.hcauditor.org/view/re/5500171005200/2018/summary generating automatically which has some number 5500171005200 within it.

我编写了以下脚本来获取结果，但真的不知道当我使用不同的搜索词时，该网址中的数字如何随着数字的变化而生成:

import requests
from bs4 import BeautifulSoup

url = 'https://wedge.hcauditor.org/view/re/5500171005200/2018/summary'

r = requests.get(url)
soup = BeautifulSoup(r.text,"lxml")
item = soup.select_one("div:contains('Tax District') + div").text
print(item)

如何获取新生成的链接中使用的号码？

最佳答案

看起来 POST 和 GET 没问题。无需寻找其他号码。我使用Session来传递cookie。但是，您引用的链接可以在 GET 响应中找到。

import requests
from bs4 import BeautifulSoup as bs

data = {
    'search_type': 'Address',
    'sort_column': 'Address',
    'site_house_number_low':5587,
    'site_house_number_high':'',
    'site_street_name': 'surrey'  
}

with requests.Session() as s:
    r = s.post('https://wedge.hcauditor.org/execute', data = data)
    r = s.get('https://wedge.hcauditor.org/view_result/0')
    soup = bs(r.content,'lxml')
    print(soup.select_one('.label + div').text)

您可以查看网络流量中捕获的详细信息和序列。我碰巧在这里使用了fiddler。

关于python - 无法解析链接中使用的数字，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57013317/

python - 无法解析链接中使用的数字

上一篇：python - 为什么我的 JupyterLab 笔记本代码控制台没有按应有的方式显示 'command history'？

下一篇：python - 将重复的行转换为带有标题的多列