php - 我怎样才能创建一个重复自身直到找到所有信息的函数？

我想创建一个 PHP 函数，它遍历网站的主页，找到主页中的所有链接，遍历它找到的链接并继续运行，直到该网站上的所有链接都是最终链接。我真的需要构建这样的东西，这样我就可以爬取我的站点网络并提供“一站式”搜索。

这是我到目前为止得到的 -

function spider($urltospider, $current_array = array(), $ignore_array = array('')) {
    if(empty($current_array)) {
        // Make the request to the original URL
        $session = curl_init($urltospider);
        curl_setopt($session, CURLOPT_RETURNTRANSFER, true);
        $html = curl_exec($session);
        curl_close($session);
        if($html != '') {
            $dom = new DOMDocument();
            @$dom->loadHTML($html);
            $xpath = new DOMXPath($dom);
            $hrefs = $xpath->evaluate("/html/body//a");
            for($i = 0; $i < $hrefs->length; $i++) {
                $href = $hrefs->item($i);
                $url = $href->getAttribute('href');
                if(!in_array($url, $ignore_array) && !in_array($url, $current_array)) {
                    // Add this URL to the current spider array
                    $current_array[] = $url;
                }
            }               
        } else {
            die('Failed connection to the URL');
        }
    } else {
        // There are already URLs in the current array
        foreach($current_array as $url) {
            // Connect to this URL

            // Find all the links in this URL

            // Go through each URL and get more links
        }
    }
}

唯一的问题是，我似乎无法理解如何继续。谁能帮我吗？基本上，此函数将重复自身，直到找到所有内容。

最佳答案

我不是 PHP 专家，但您似乎把它复杂化了。

function spider($urltospider, $current_array = array(), $ignore_array = array('')) {
    if(empty($current_array)) {
        $current_array[] =  $urltospider;
    $cur_crawl = 0;
    while ($cur_crawl < len($current_array)) { //don't use foreach because that can get messed up if you change the array while inside the loop.
        $links_found = crawl($current_array($cur_crawl)); //crawl should return all links found in the given page
        //Now keep adding $links_found to $current_array. Maybe you can check if any of the links found are already in $current_array so you don't crawl them multiple times
        $current_array = array_merge($current_array, $links_found);
        $cur_crawl += 1;
    }
return $current_array;
}

关于php - 我怎样才能创建一个重复自身直到找到所有信息的函数？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/3274504/

php - 我怎样才能创建一个重复自身直到找到所有信息的函数？

上一篇：Matlab specgram 过时 vs spectrogram 替换

下一篇：php - 使用 PHP 创建表单有多容易？