http - 为什么 curl 不起作用，但 wget 起作用？

我同时使用 curl 和 wget 来获取此 url:http://opinionator.blogs.nytimes.com/2012/01/19/118675/

对于 curl，它根本不返回任何输出，但是对于 wget，它返回整个 HTML 源:

这是 2 个命令。我使用了相同的用户代理，并且都来自相同的 IP，并且都遵循重定向。网址完全一样。对于 curl，它会在 1 秒后立即返回，所以我知道这不是超时问题。

curl -L -s "http://opinionator.blogs.nytimes.com/2012/01/19/118675/" --max-redirs 10000 --location --connect-timeout 20 -m 20 -A "Mozilla/5.0 (Windows NT 5.2; rv:2.0.1) Gecko/20100101 Firefox/4.0.1" 2>&1

wget http://opinionator.blogs.nytimes.com/2012/01/19/118675/ --user-agent="Mozilla/5.0 (Windows NT 5.2; rv:2.0.1) Gecko/20100101 Firefox/4.0.1"

如果 NY Times 可能隐藏了真实内容，并且没有将源返回给 curl，那么 curl 发送的 header 有什么不同？我假设由于用户代理是相同的，因此这两个请求的请求看起来应该完全相同。我应该检查哪些其他“足迹”？

最佳答案

解决方法就是分析你的curl通过做请求curl -v ...和你的 wget 请求做 wget -d ...这表明 curl 被重定向到登录页面

> GET /2012/01/19/118675/ HTTP/1.1
> User-Agent: Mozilla/5.0 (Windows NT 5.2; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
> Host: opinionator.blogs.nytimes.com
> Accept: */*
> 
< HTTP/1.1 303 See Other
< Date: Wed, 08 Jan 2014 03:23:06 GMT
* Server Apache is not blacklisted
< Server: Apache
< Location: http://www.nytimes.com/glogin?URI=http://opinionator.blogs.nytimes.com/2012/01/19/118675/&OQ=_rQ3D0&OP=1b5c69eQ2FCinbCQ5DzLCaaaCvLgqCPhKP
< Content-Length: 0
< Content-Type: text/plain; charset=UTF-8

接着是一个重定向循环(您一定已经注意到了，因为您已经设置了 --max-redirs 标志)。

另一方面，wget遵循相同的顺序，除了它返回由 nytimes.com 设置的 cookie 及其后续请求

---request begin---
GET /2012/01/19/118675/?_r=0 HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 5.2; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
Accept: */*
Host: opinionator.blogs.nytimes.com
Connection: Keep-Alive
Cookie: NYT-S=0MhLY3awSMyxXDXrmvxADeHDiNOMaMEZFGdeFz9JchiAIUFL2BEX5FWcV.Ynx4rkFI

curl 发送的请求从不包含 cookie。

我看到修改 curl 命令并获取所需资源的最简单方法是添加 -c cookiefile到你的 curl 命令。这会将 cookie 存储在名为“cookiefile”的其他未使用的临时“cookie jar”文件中，从而使 curl 能够在其后续请求中发送所需的 cookie。

例如，我添加了标志 -c x直接在“curl”之后，我获得了与 wget 一样的输出(除了 wget 将其写入文件并 curl 在 STDOUT 上打印)。

关于http - 为什么 curl 不起作用，但 wget 起作用？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20986395/

http - 为什么 curl 不起作用，但 wget 起作用？

上一篇：http - 如何在 OCaml 中发出简单的 GET 请求？

下一篇：http - 每个请求的 Apache HTTP 客户端 4.3 凭据