php - 重构循环?

标签 php algorithm refactoring large-data

我想循环超过 200,000 个用户数据集来过滤 30,000 个产品,我该如何优化这个嵌套大循环以获得最佳性能?

  //settings , 5 max per user, can up to 200,000
   $settings = array(...);

   //all prods, up to 30,000
   $prods = array(...);

   //all prods category relation map, up to 2 * 30,000
   $prods_cate_ref_all = array(...);

   //msgs filtered by settings saved yesterday , more then 100 * 200,000
   $msg_all = array(...);

   //filter counter
   $j = 0;

   //filter result
   $res = array();

   foreach($settings as $set){

       foreach($prods as $k=>$p){

           //filter prods by site_id 
           if ($set['site_id'] != $p['site_id']) continue;

               //filter prods by city_id , city_id == 0 is all over the country
           if ($set['city_id'] != $p['city_id'] && $p['city_id'] > 0) continue;

           //muti settings of a user may get same prods
               if (prod_in($p['id'], $set['uuid'], $res)) continue;

            //prods filtered by settings saved  to msg table yesterday
           if (msg_in($p['id'], $set['uuid'], $msg_all)) continue;

               //filter prods by category id 
           if (!prod_cate_in($p['id'], $set['cate_id'], $prods_cate_ref_all)) continue;

            //filter prods by tags of set not in prod title, website ...
                $arr = array($p['title'], $p['website'], $p['detail'], $p['shop'], $p['tags']);
           if (!tags_in($set['tags'], $arr)) continue; 

               $res[$j]['name'] = $v['name'];
           $res[$j]['prod_id'] = $p['id'];
               $res[$j]['uuid'] = $v['uuid'];
               $res[$j]['msg'] = '...';
               $j++;
       }

   }

   save_to_msg($res);

function prod_in($prod_id, $uuid, $prod_all){
    foreach($prod_all as $v){
    if ($v['prod_id'] == $prod_id && $v['uuid'] == $uuid)
        return true;
    }
    return false;
}

function prod_cate_in($prod_id, $cate_id, $prod_cate_all){
    foreach($prod_cate_all as $v){
    if ($v['prod_id'] == $prod_id && $v['cate_id'] == $cate_id)
        return true;
    }
    return false;
}

function tags_in($tags, $arr){
    $tag_arr = explode(',', str_replace(',', ',', $tags));
    foreach($tag_arr as $v){
    foreach($arr as $a){
        if(strpos($a, strtolower($v)) !== false){
        return true;
        }
    }
    }
    return false;
}

function msg_in($prod_id, $uuid, $msg_all){
    foreach($msg_all as $v){
    if ($v['prod_id'] == $prod_id && $v['uuid'] == $uuid)
        return true;
    }
    return false;
}

更新: 多谢。 是的,数据在 mysql 中,下面是主要结构:

-- user settings to filter prods, 5 max per user
CREATE TABLE setting(
   id INT NOT NULL AUTO_INCREMENT, 
   uuid VARCHAR(100) NOT NULL DEFAULT '',
   tags VARCHAR(100) NOT NULL DEFAULT '',
   site_id SMALLINT UNSIGNED NOT NULL DEFAULT 0,
   city_id MEDIUMINT UNSIGNED NOT NULL DEFAULT 0,
   cate_id MEDIUMINT UNSIGNED NOT NULL DEFAULT 0,
   addtime INT UNSIGNED NOT NULL DEFAULT 0,
   PRIMARY KEY (`id`), 
   KEY `idx_setting_uuid` (`uuid`),
   KEY `idx_setting_tags` (`tags`),
   KEY `idx_setting_city_id` (`city_id`),
   KEY `idx_setting_cate_id` (`cate_id`)
) DEFAULT CHARSET=utf8;


CREATE TABLE users(
   id INT NOT NULL AUTO_INCREMENT, 
   uuid VARCHAR(100) NOT NULL DEFAULT '',
   PRIMARY KEY (`id`),   
   UNIQUE KEY `idx_unique_uuid` (`uuid`)
) DEFAULT CHARSET=utf8;


-- filtered prods
CREATE TABLE msg_list(
   id INT NOT NULL AUTO_INCREMENT, 
   uuid VARCHAR(100) NOT NULL DEFAULT '',
   prod_id INT UNSIGNED NOT NULL DEFAULT 0,
   msg TEXT NOT NULL DEFAULT '',
   PRIMARY KEY (`id`),
   KEY `idx_ml_uuid` (`uuid`)
) DEFAULT CHARSET=utf8;



-- prods and prod_cate_ref table in another database, so can not join it


CREATE TABLE prod(
   id INT NOT NULL AUTO_INCREMENT, 
   website VARCHAR(100) NOT NULL DEFAULT '' COMMENT ' site name ',
   site_id MEDIUMINT UNSIGNED NOT NULL DEFAULT 0,
   city_id MEDIUMINT UNSIGNED NOT NULL DEFAULT 0,
   title VARCHAR(50) NOT NULL DEFAULT '',
   tags VARCHAR(50) NOT NULL DEFAULT '',
   detail VARCHAR(500) NOT NULL DEFAULT '',
   shop VARCHAR(300) NOT NULL DEFAULT '',
   PRIMARY KEY (`id`),
   KEY `idx_prod_tags` (`tags`),
   KEY `idx_prod_site_id` (`site_id`),
   KEY `idx_prod_city_id` (`city_id`),
   KEY `idx_prod_mix` (`site_id`,`city_id`,`tags`)
) DEFAULT CHARSET=utf8;

CREATE TABLE prod_cate_ref(
   id MEDIUMINT NOT NULL AUTO_INCREMENT, 
   prod_id INT NOT NULL NULL DEFAULT 0,
   cate_id MEDIUMINT NOT NULL NULL DEFAULT 0,
   PRIMARY KEY (`id`),
   KEY `idx_pcr_mix` (`prod_id`,`cate_id`)
) DEFAULT CHARSET=utf8;


-- ENGINE all is myisam

我不知道如何只使用一个 sql 来获取所有内容。

最佳答案

谢谢大家对我的启发,我终于明白了,这确实是一个如此简单的方法,但却是一个巨大的进步!

我将 $prods_cate_ref_all 和 $msg_all 中的数据重新分组(最后使用这两个函数), 还有结果数组 $res, 然后使用 strpos 和 in_array 而不是三个迭代函数 (prod_in msg_in prod_cate_in) ,

我得到了惊人的 50 倍加速!!! 随着数据变大,效果变得更有效。

  //settings , 5 max per user, can up to 200,000
   $settings = array(...);

   //all prods, up to 30,000
   $prods = array(...);

   //all prods category relation map, up to 2 * 30,000
   $prods_cate_ref_all = get_cate_ref_all();

   //msgs filtered by settings saved yesterday , more then 100 * 200,000
   $msg_all = get_msg_all();

   //filter counter
   $j = 0;

   //filter result
   $res = array();


  foreach($settings as $set){

       foreach($prods as $p){

       $res_uuid_setted = false;

       $uuid = $set['uuid'];

       if (isset($res[$uuid])){
           $res_uuid_setted = true;
       }

       //filter prods by site_id 
       if ($set['site_id'] != $p['site_id']) 
               continue;

       //filter prods by city_id , city_id == 0 is all over the country
       if ($set['city_id'] != $p['city_id'] && $p['city_id'] > 0) 
               continue;


       //muti settings of a user may get same prods
       if ($res_uuid_setted)
           //in_array faster than strpos if item < 1000
           if (in_array($p['id'], $res[$uuid]['prod_ids']))
           continue;

       //prods filtered by settings saved  to msg table yesterday
       if (isset($msg_all[$uuid]))
           //strpos faster than in_array in large data
           if (false !== strpos($msg_all[$uuid], ' ' . $p['id'] . ' '))
           continue;

       //filter prods by category id 
       if (false === strpos($prods_cate_ref_all[$p['id']], ' ' . $set['cate_id'] . ' '))
           continue;

       $arr = array($p['title'], $p['website'], $p['detail'], $p['shop'], $p['tags']);
       if (!tags_in($set['tags'], $arr))
           continue;


       $res[$uuid]['prod_ids'][] = $p['id'];

       $res[$uuid][] = array(
        'name' => $set['name'],
        'prod_id' => $p['id'],
        'msg' => '',
       );

       }

   }


function get_msg_all(){

    $temp = array();
    $msg_all = array(
        array('uuid' => 312, 'prod_id' => 211),
        array('uuid' => 1227, 'prod_id' => 31),
        array('uuid' => 1, 'prod_id' => 72),
        array('uuid' => 993, 'prod_id' => 332),
        ...
    );

    foreach($msg_all as $k=>$v){
    if (!isset($temp[$v['uuid']])) 
        $temp[$v['uuid']] = ' ';

    $temp[$v['uuid']] .= $v['prod_id'] . ' ';
    }

    $msg_all = $temp;
    unset($temp);

    return $msg_all;
}


function get_cate_ref_all(){

    $temp = array();
    $cate_ref = array(
        array('prod_id' => 3, 'cate_id' => 21),
        array('prod_id' => 27, 'cate_id' => 1),
        array('prod_id' => 1, 'cate_id' => 232),
        array('prod_id' => 3, 'cate_id' => 232),
        ...
    );

    foreach($cate_ref as $k=>$v){
    if (!isset($temp[$v['prod_id']]))
        $temp[$v['prod_id']] = ' ';

    $temp[$v['prod_id']] .= $v['cate_id'] . ' ';
    }
    $cate_ref = $temp;
    unset($temp);

    return $cate_ref;
}

关于php - 重构循环?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7216200/

相关文章:

php - 只更新数据库中的一个表

PHP - 从多维数组中删除条目

php - 在没有 SSL 的情况下保护我的帖子请求

string - 输出此字符串序列的第 n 遍

algorithm - 查找给定开始和结束时间的并发事件数

jquery - 如何在 jQuery 中合并 $.load() 函数

PHP 代码不更新 SQL 数据库,尽管 print_r 建议正确的输入

algorithm - 代码的出现 2016 : Day 1

sql - 存储过程: reduce code duplication using temp tables

c# - 使用泛型重构重复方法