php - 使用 HTTP header 和 curl 查找 URL 重定向?

标签 php redirect curl http-headers

我正在尝试编写重定向检查程序,以检查 URL 是否对搜索引擎友好。它必须检查 URL 是否被重定向,如果被重定向,它必须判断它是否对 SEO 友好(301 状态代码)或不友好(302/304)。

这是我发现的类似内容:http://www.webconfs.com/redirect-check.php

它还应该能够遵循多个重定向(例如从 A 到 B 再到 C)并告诉我 A 重定向到 C。

这是我目前得到的结果,但它不能正常工作(例如:当输入 www.example.com 时,它没有找到到 www.example.com/page1 的重定向)

<?php
// You can edit the messages of the respective code over here
$httpcode  = array();
$httpcode["200"] = "Ok";
$httpcode["201"] = "Created";
$httpcode["302"] = "Found";
$httpcode["301"] = "Moved Permanently";
$httpcode["304"] = "Not Modified";
$httpcode["400"] = "Bad Request";


if(count($_POST)>0)
{
    $url = $_POST["url"];
    $curlurl = "http://".$url."/";
    $ch = curl_init();
    // Set URL to download
    curl_setopt($ch, CURLOPT_URL, $curlurl);

    // User agent
    curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER["HTTP_USER_AGENT"]);
    // Include header in result? (0 = yes, 1 = no)
    curl_setopt($ch, CURLOPT_HEADER, 0);

    // Should cURL return or print out the data? (true = return, false = print)
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

    // Timeout in seconds
    curl_setopt($ch, CURLOPT_TIMEOUT, 15);

    // Download the given URL, and return output
    $output = curl_exec($ch);

    $curlinfo = curl_getinfo($ch);

    if(($curlinfo["http_code"]=="301") || ($curlinfo["http_code"]=="302"))
    {
        $ch = curl_init();
        // Set URL to download
        curl_setopt($ch, CURLOPT_URL, $curlurl);

        // User agent
        curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER["HTTP_USER_AGENT"]);
        // Include header in result? (0 = yes, 1 = no)
        curl_setopt($ch, CURLOPT_HEADER, 0);

        // Should cURL return or print out the data? (true = return, false = print)
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

        // Timeout in seconds
        curl_setopt($ch, CURLOPT_TIMEOUT, 15);


        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
        // Download the given URL, and return output
        $output = curl_exec($ch);

        $curlinfo = curl_getinfo($ch);
        echo $url." is redirected to ".$curlinfo["url"];
    }
    else
    {
        echo $url." is not getting redirected";
    }

    // Close the cURL resource, and free system resources
    curl_close($ch);
}
?>
<form action="" method="post">
http://<input type="text" name="url" size="30" />/ <b>e.g. www.google.com</b><br/>
<input type="submit" value="Submit" />
</form>

最佳答案

好吧,如果你想记录每个重定向,你必须自己实现并关闭自动“位置跟踪”:

function curl_trace_redirects($url, $timeout = 15) {

    $result = array();
    $ch = curl_init();

    $trace = true;
    $currentUrl = $url;

    $urlHist = array();
    while($trace && $timeout > 0 && !isset($urlHist[$currentUrl])) {
        $urlHist[$currentUrl] = true;

        curl_setopt($ch, CURLOPT_URL, $currentUrl);
        curl_setopt($ch, CURLOPT_HEADER, true);
        curl_setopt($ch, CURLOPT_NOBODY, true);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
        curl_setopt($ch, CURLOPT_TIMEOUT, $timeout);

        $output = curl_exec($ch);

        if($output === false) {
            $traceItem = array(
                'errorno' => curl_errno($ch),
                'error' => curl_error($ch),
            );

            $trace = false;
        } else {
            $curlinfo = curl_getinfo($ch);

            if(isset($curlinfo['total_time'])) {
                $timeout -= $curlinfo['total_time'];
            }

            if(!isset($curlinfo['redirect_url'])) {
                $curlinfo['redirect_url'] = get_redirect_url($output);
            }

            if(!empty($curlinfo['redirect_url'])) {
                $currentUrl = $curlinfo['redirect_url'];
            } else {
                $trace = false;
            }

            $traceItem = $curlinfo;
        }

        $result[] = $traceItem;
    }

    if($timeout < 0) {
        $result[] = array('timeout' => $timeout);
    }

    curl_close($ch);

    return $result;
}

// apparently 'redirect_url' is not available on all curl-versions
// so we fetch the location header ourselves
function get_redirect_url($header) {
    if(preg_match('/^Location:\s+(.*)$/mi', $header, $m)) {
        return trim($m[1]);
    }

    return "";
}

然后你就这样使用它:

$res = curl_trace_redirects("http://www.example.com");
foreach($res as $item) {
    if(isset($item['timeout'])) {
        echo "Timeout reached!\n";
    } else if(isset($item['error'])) {
        echo "error: ", $item['error'], "\n";
    } else {
        echo $item['url'];
        if(!empty($item['redirect_url'])) {
            // redirection
            echo " -> (", $item['http_code'], ")";
        }

        echo "\n";
    }
}

我的代码可能没有经过深思熟虑,但我想这是一个好的开始。

编辑

这是一些示例输出:

http://midas/~stefan/test/redirect/fritzli.html -> (302)
http://midas/~stefan/test/redirect/hansli.html -> (301)
http://midas/~stefan/test/redirect/heiri.html

关于php - 使用 HTTP header 和 curl 查找 URL 重定向?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8534014/

相关文章:

php - guzzle,如何在多部分/表单数据中强制内容类型

php - 无法使用谷歌 API 将对象上传到谷歌云存储

php - 如何在 CURL 重定向上传递 cookie?

php - 登录将无法正常工作 PHP

Django Apache 重定向问题

node.js - 带或不带 'www' 的域名

c++ - 在 Visual C++ 上使用 libcurl 登录网站

swift - 如何使用 Alamofire 调用 MailGun API?

php - 如何隐藏或删除默认可用的控制台命令?

javascript - 如何将 .html() 中的 javascript 变量传递给 php