Python 短网址扩展器

我在扩展短 URL 时遇到问题，因为并非所有我使用的都使用相同的重定向:

想法是扩展缩短的 url:这里有几个短 url 的例子 --> Final url。我需要一个函数来获取缩短的 url 并返回扩展的 url

http://chollo.to/675za --> http://www.elcorteingles.es/limite-48-horas/equipaje/?sorting=priceAsc&aff_id=2118094&dclid=COvjy8Xrz9UCFeMi0wod4ZULuw

所以我有一些半工作的东西，它在一些 abobe 示例中失败了

import requests
import httplib
import urlparse


def unshorten_url(url):
try:

parsed = urlparse.urlparse(url)
h = httplib.HTTPConnection(parsed.netloc)
h.request('HEAD', parsed.path)
response = h.getresponse()

if response.status / 100 == 3 and response.getheader('Location'):
url = requests.get(response.getheader('Location')).url
print url
return url

else:
url = requests.get(url).url
print url
return url


except Exception as e:
print(e)

最佳答案

预期的重定向似乎不是 well-formed根据请求:

import requests

response = requests.get('http://chollo.to/675za')
for resp in response.history:
    print(resp.status_code, resp.url)
print(response.url)
print(response.is_redirect)

输出:

301 http://chollo.to/675za
http://web.epartner.es/click.asp?ref=754218&site=14010&type=text&tnb=39&diurl=https%3A%2F%2Fad.doubleclick.net%2Fddm%2Fclk%2F302111021%3B129203261%3By%3Fhttp%3A%2F%2Fwww.elcorteingles.es%2Flimite-48-horas%2Fequipaje%2F%3Fsorting%3DpriceAsc%26aff_id%3D2118094
False

这可能是 epartner 或 doubleclick 有意为之。对于这些类型的嵌套 url，您需要一个额外的步骤，例如:

from urllib.parse import unquote
# from urllib import unquote # python2

# if response.url.count('http') > 1:
url = 'http' + response.url.split('http')[-1]
unquote(url)

# http://www.elcorteingles.es/limite-48-horas/equipaje/?sorting=priceAsc&aff_id=2118094

注意:这样做可能会避免预期的广告收入。

关于Python 短网址扩展器，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45663681/

Python 短网址扩展器

上一篇：perl - 从类子例程覆盖模块子例程

下一篇：reactjs - 模拟在另一个函数中使用的函数