我使用 urllib.request
在 python 中创建了一个脚本申请https
其中的代理。我尝试过如下操作,但遇到了不同类型的问题,如urllib.error.URLError: <urlopen error [WinError 10060] A connection attempt failed----
。该脚本应该从该站点获取 IP 地址。脚本中使用的 IP 地址是占位符。我已采纳 here 提出的建议.
第一次尝试:
import urllib.request
from bs4 import BeautifulSoup
url = 'https://whatismyipaddress.com/proxy-check'
headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'}
proxy_host = '60.191.11.246:3128'
req = urllib.request.Request(url,headers=headers)
req.set_proxy(proxy_host, 'https')
resp = urllib.request.urlopen(req).read()
soup = BeautifulSoup(resp,"html5lib")
ip_addr = soup.select_one("td:contains('IP')").find_next('td').text
print(ip_addr)
另一种方式(使用 os.environ
):
headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'}
proxy = '60.191.11.246:3128'
os.environ["https_proxy"] = f'http://{proxy}'
req = urllib.request.Request(url,headers=headers)
resp = urllib.request.urlopen(req).read()
soup = BeautifulSoup(resp,"html5lib")
ip_addr = soup.select_one("td:contains('IP')").find_next('td').text
print(ip_addr)
我尝试过的另一种方法:
agent = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'
proxy_host = '205.158.57.2:53281'
proxy = {'https': f'http://{proxy_host}'}
proxy_support = urllib.request.ProxyHandler(proxy)
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)
opener.addheaders = [('User-agent', agent)]
res = opener.open(url).read()
soup = BeautifulSoup(res,"html5lib")
ip_addr = soup.select_one("td:contains('IP')").find_next('td').text
print(ip_addr)
如何在 urllib.request 中以正确的方式使用 https 代理?
最佳答案
当我们测试代理时,有 unusual traffic from your computer network对于 Google 服务,这就是响应错误的原因,因为 whatismyipaddress使用 Google 的服务。但该问题并未影响 stackoverflow 等其他网站。
from urllib import request
from bs4 import BeautifulSoup
url = 'https://whatismyipaddress.com/proxy-check'
proxies = {
# 'https': 'https://167.172.229.86:8080',
# 'https': 'https://51.91.137.248:3128',
'https': 'https://118.70.144.77:3128',
}
user_agent = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'
headers = {
'User-Agent': user_agent,
'accept-language': 'ru,en-US;q=0.9,en;q=0.8,tr;q=0.7'
}
proxy_support = request.ProxyHandler(proxies)
opener = request.build_opener(proxy_support)
# opener.addheaders = [('User-Agent', user_agent)]
request.install_opener(opener)
req = request.Request(url, headers=headers)
try:
response = request.urlopen(req).read()
soup = BeautifulSoup(response, "html5lib")
ip_addr = soup.select_one("td:contains('IP')").find_next('td').text
print(ip_addr)
except Exception as e:
print(e)
关于python - 无法在 urllib.request 中使用 https 代理,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59594692/