尽管 HTTP 代码 200,PHP cURL 返回空 header 和正文

标签 php string http curl

因此我尝试使用 cURL 废弃此 URL:xxxx.fr,但无法访问页面 HTML 代码,页眉和正文都是空的。 HTTP代码返回为200 我尝试使用其他 URL(不同的域),它就像一个魅力。 我也尝试使用不同的 User Agent 和 Referer

你知道错在哪里吗?至少有人可以在您自己的服务器上尝试此代码,如果您遇到同样的问题,请告诉我?

谢谢

下面是我的代码:

  $url = 'http://www.xxxx.fr';

  $header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
  $header[] = "Cache-Control: max-age=0";
  $header[] = "Connection: keep-alive";
  $header[] = "Keep-Alive: timeout=5, max=100";
  $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
  $header[] = "Accept-Language: en-us,en;q=0.5";
  $header[] = ""; // BROWSERS USUALLY LEAVE BLANK

  $curl = curl_init ();
  curl_setopt($curl, CURLOPT_URL, $url);
  curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
  curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0");
  curl_setopt($curl, CURLOPT_ENCODING, 'gzip,deflate');
  curl_setopt($curl, CURLOPT_REFERER, "http://www.google.fr");
  curl_setopt($curl, CURLOPT_HEADER, 1);
  curl_setopt($curl, CURLINFO_HEADER_OUT, 1);
  curl_setopt($curl, CURLOPT_VERBOSE, 1);
  curl_setopt($curl, CURLOPT_COOKIEFILE, getcwd().'/cookies.txt');
  curl_setopt($curl, CURLOPT_COOKIEJAR, getcwd().'/cookies.txt');
  curl_setopt($curl, CURLOPT_TIMEOUT, 30);
  curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
  $curlData = curl_exec($curl);

  $infos = curl_getinfo($curl);
  print_r($infos);

  curl_close ( $curl );

  echo "<hr>Page:<br />";
  echo htmlentities($curlData);

这是 print_r($infos) 的结果:

Array ( 
[url] => http://www.xxxx.fr 
[content_type] => text/html 
[http_code] => 200 
[header_size] => 625 
[request_size] => 465 
[filetime] => -1 
[ssl_verify_result] => 0
[redirect_count] => 0 
[total_time] => 0.032535 
[namelookup_time] => 0.001488 
[connect_time] => 0.002581 
[pretransfer_time] => 0.002639 
[size_upload] => 0 
[size_download] => 10234 
[speed_download] => 314553 
[speed_upload] => 0 
[download_content_length] => -1 
[upload_content_length] => 0 
[starttransfer_time] => 0.032088 
[redirect_time] => 0 
[certinfo] => Array ( ) 
[primary_ip] => xxx 
[primary_port] => 80 
[local_ip] => xxx 
[local_port] => 37319 
[redirect_url] => 
[request_header] => GET / HTTP/1.1 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0 Host: www.xxxx.fr Accept-Encoding: gzip,deflate Referer: http://www.google.fr Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Cache-Control: max-age=0 Connection: keep-alive Keep-Alive: timeout=5, max=100 Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Accept-Language: en-us,en;q=0.5 
) 

最佳答案

//编辑

htmlentities($curlData) 返回空字符串,因为源编码是非 UTF-8 字符串 see this link

应该可行:

 htmlentities($curlData, ENT_QUOTES,'ISO-8859-1' );

in PHP 5.4 release, htmlspecialchars() doesn’t use ISO-8859-1 as default encoding. In fact htmlspecialchars() as of PHP 5.4 uses UTF-8. You might expect, that htmlspecialchars() would just skip non-UTF-8 byte sequences or translate them to a ‘no found’ character. In fact, htmlspecialchars() returns a blank string: No error gets generated, no errorcode gets returned, no exception gets raised, just a blank string gets returned if non-valid UTF-8 sequences get passed in

关于尽管 HTTP 代码 200,PHP cURL 返回空 header 和正文,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30189680/

相关文章:

c# - 如何将 ListItemCollection (dropdownlist.items) 转换为字典<string,string>?

string - 这个字符串操作代码的空间复杂度是多少?

string - 在单元测试中使用与被测系统相同的常量是个好主意吗?

ios - AFNetworking 错误太多 http 重定向 iOS 9

c - 套接字打开 http url 并检索数据

ruby-on-rails - Rails 和 respond_with : why are the status codes not right?

PHPExcel - 从单元格中读取时间值

php - Doctrine executeUpdate 数组参数

php - MySql SELECT语句计算排名

php - 无法从 webservice ionic 框架检索数据