我在扩展短 URL 时遇到问题,因为并非所有我使用的都使用相同的重定向:
想法是扩展缩短的 url:这里有几个短 url 的例子 --> Final url。我需要一个函数来获取缩短的 url 并返回扩展的 url
http://chollo.to/675za --> http://www.elcorteingles.es/limite-48-horas/equipaje/?sorting=priceAsc&aff_id=2118094&dclid=COvjy8Xrz9UCFeMi0wod4ZULuw
所以我有一些半工作的东西,它在一些 abobe 示例中失败了
import requests
import httplib
import urlparse
def unshorten_url(url):
try:
parsed = urlparse.urlparse(url)
h = httplib.HTTPConnection(parsed.netloc)
h.request('HEAD', parsed.path)
response = h.getresponse()
if response.status / 100 == 3 and response.getheader('Location'):
url = requests.get(response.getheader('Location')).url
print url
return url
else:
url = requests.get(url).url
print url
return url
except Exception as e:
print(e)
最佳答案
预期的重定向似乎不是 well-formed根据请求
:
import requests
response = requests.get('http://chollo.to/675za')
for resp in response.history:
print(resp.status_code, resp.url)
print(response.url)
print(response.is_redirect)
输出:
301 http://chollo.to/675za
http://web.epartner.es/click.asp?ref=754218&site=14010&type=text&tnb=39&diurl=https%3A%2F%2Fad.doubleclick.net%2Fddm%2Fclk%2F302111021%3B129203261%3By%3Fhttp%3A%2F%2Fwww.elcorteingles.es%2Flimite-48-horas%2Fequipaje%2F%3Fsorting%3DpriceAsc%26aff_id%3D2118094
False
这可能是 epartner 或 doubleclick 有意为之。对于这些类型的嵌套 url,您需要一个额外的步骤,例如:
from urllib.parse import unquote
# from urllib import unquote # python2
# if response.url.count('http') > 1:
url = 'http' + response.url.split('http')[-1]
unquote(url)
# http://www.elcorteingles.es/limite-48-horas/equipaje/?sorting=priceAsc&aff_id=2118094
注意:这样做可能会避免预期的广告收入。
关于Python 短网址扩展器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45663681/