php - 来自多个 url 的 file_get_contents

标签 php arrays url save file-get-contents

我想将页面内容保存到来自多个 url 的文件中。

首先我有来自数组的网站 url

$site = array( 
        'url' => 'http://onesite.com/index.php?c='.$row['code0'].'&o='.$row['code1'].'&y='.$row['code2'].'&a='.$row['cod3'].'&sid=', 'selector' => 'table.tabel tr'
    );  

为了保存文件,我尝试过:

foreach($site  as $n) {
$referer = 'reffername';


$header[] = "Accept: text/xml,application/xml,application/json,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";

$opts = array('http'=>array('method'=>"GET",
                            'header'=>implode('\r\n',$header)."\r\n".
                            "Referer: $referer\r\n",
                            'user_agent'=> "Mozilla/5.0 (X11; U; Linux i686; pl-PL; rv:1.9.0.2) Gecko/2008092313 Ubuntu/9.25 (jaunty) Firefox/3.8"));
$context = stream_context_create($opts);

$data = file_get_contents($site["url"], false, $context);

$file = md5('$id');

file_put_contents($file, $data);
$content = unserialize(file_get_contents($file));
}

最佳答案

基本 cURL 多脚本:

// Your URL array that hold links to files 
$urls = array(); 

// cURL multi-handle
$mh = curl_multi_init();

// This will hold cURLS requests for each file
$requests = array();

$options = array(
    CURLOPT_FOLLOWLOCATION => true,
    CURLOPT_AUTOREFERER    => true, 
    CURLOPT_USERAGENT      => 'paste your user agent string here',
    CURLOPT_HEADER         => false,
    CURLOPT_SSL_VERIFYPEER => false,
    CURLOPT_RETURNTRANSFER => true
);

//Corresponding filestream array for each file
$fstreams = array();

$folder = 'content/';
if (!file_exists($folder)){ mkdir($folder, 0777, true); }

foreach ($urls as $key => $url)
{
    // Add initialized cURL object to array
    $requests[$key] = curl_init($url);

    // Set cURL object options
    curl_setopt_array($requests[$key], $options);

    // Extract filename from URl and create appropriate local path
    $path     = parse_url($url, PHP_URL_PATH);
    $filename = pathinfo($path, PATHINFO_FILENAME).'-'.$key; // Or whatever you want
    $filepath = $folder.$filename;

    // Open a filestream for each file and assign it to corresponding cURL object
    $fstreams[$key] = fopen($filepath, 'w');
    curl_setopt($requests[$key], CURLOPT_FILE, $fstreams[$key]);

    // Add cURL object to multi-handle
    curl_multi_add_handle($mh, $requests[$key]);
}

// Do while all request have been completed
do {
   curl_multi_exec($mh, $active);
} while ($active > 0);

// Collect all data here and clean up
foreach ($requests as $key => $request) {

    //$returned[$key] = curl_multi_getcontent($request); // Use this if you're not downloading into file, also remove CURLOPT_FILE option and fstreams array
    curl_multi_remove_handle($mh, $request); //assuming we're being responsible about our resource management
    curl_close($request);                    //being responsible again.  THIS MUST GO AFTER curl_multi_getcontent();
    fclose($fstreams[$key]);
}

curl_multi_close($mh);

关于php - 来自多个 url 的 file_get_contents,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21362362/

相关文章:

php - Laravel 警告和 fatal error 中要求打开失败

javascript - 如何在成功 block 之外使用ajax函数的值

php - Eclipse Indigo 3.7 不再验证 PHP (PDT) 或 Javascript 代码

python - 3D numpy 数组到 1D numpy 数组的高效转换

java - 将2个long转换为对应的byte[],并在中间加上分隔符

xml - XSD:xs:schema 元素 "URL attribute"的含义

iphone - 检查 URL 文件是否存在

php - Laravel Jensenggers Eloquent 模型对主模型与关系模型进行排序

javascript - 为什么我不能在空数组上调用 Array.prototype.map?

python - 如何在 Python 中为需要文件名的函数提供 URL