因此我尝试使用 cURL 废弃此 URL:xxxx.fr,但无法访问页面 HTML 代码,页眉和正文都是空的。 HTTP代码返回为200 我尝试使用其他 URL(不同的域),它就像一个魅力。 我也尝试使用不同的 User Agent 和 Referer
你知道错在哪里吗?至少有人可以在您自己的服务器上尝试此代码,如果您遇到同样的问题,请告诉我?
谢谢
下面是我的代码:
$url = 'http://www.xxxx.fr';
$header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: timeout=5, max=100";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
$header[] = ""; // BROWSERS USUALLY LEAVE BLANK
$curl = curl_init ();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0");
curl_setopt($curl, CURLOPT_ENCODING, 'gzip,deflate');
curl_setopt($curl, CURLOPT_REFERER, "http://www.google.fr");
curl_setopt($curl, CURLOPT_HEADER, 1);
curl_setopt($curl, CURLINFO_HEADER_OUT, 1);
curl_setopt($curl, CURLOPT_VERBOSE, 1);
curl_setopt($curl, CURLOPT_COOKIEFILE, getcwd().'/cookies.txt');
curl_setopt($curl, CURLOPT_COOKIEJAR, getcwd().'/cookies.txt');
curl_setopt($curl, CURLOPT_TIMEOUT, 30);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$curlData = curl_exec($curl);
$infos = curl_getinfo($curl);
print_r($infos);
curl_close ( $curl );
echo "<hr>Page:<br />";
echo htmlentities($curlData);
这是 print_r($infos) 的结果:
Array (
[url] => http://www.xxxx.fr
[content_type] => text/html
[http_code] => 200
[header_size] => 625
[request_size] => 465
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 0
[total_time] => 0.032535
[namelookup_time] => 0.001488
[connect_time] => 0.002581
[pretransfer_time] => 0.002639
[size_upload] => 0
[size_download] => 10234
[speed_download] => 314553
[speed_upload] => 0
[download_content_length] => -1
[upload_content_length] => 0
[starttransfer_time] => 0.032088
[redirect_time] => 0
[certinfo] => Array ( )
[primary_ip] => xxx
[primary_port] => 80
[local_ip] => xxx
[local_port] => 37319
[redirect_url] =>
[request_header] => GET / HTTP/1.1 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0 Host: www.xxxx.fr Accept-Encoding: gzip,deflate Referer: http://www.google.fr Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Cache-Control: max-age=0 Connection: keep-alive Keep-Alive: timeout=5, max=100 Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Accept-Language: en-us,en;q=0.5
)
最佳答案
//编辑
htmlentities($curlData) 返回空字符串,因为源编码是非 UTF-8 字符串 see this link
应该可行:
htmlentities($curlData, ENT_QUOTES,'ISO-8859-1' );
in PHP 5.4 release, htmlspecialchars() doesn’t use ISO-8859-1 as default encoding. In fact htmlspecialchars() as of PHP 5.4 uses UTF-8. You might expect, that htmlspecialchars() would just skip non-UTF-8 byte sequences or translate them to a ‘no found’ character. In fact, htmlspecialchars() returns a blank string: No error gets generated, no errorcode gets returned, no exception gets raised, just a blank string gets returned if non-valid UTF-8 sequences get passed in
关于尽管 HTTP 代码 200,PHP cURL 返回空 header 和正文,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30189680/