python - H2O Python API : retrieve best models from GridSearch

标签 python h2o

我正在使用以下代码使用 Python API 使用 H2O 执行 GridSearch,

from h2o.estimators.random_forest import H2ORandomForestEstimator
from h2o.grid import H2OGridSearch

hyper_parameters = {'ntrees':[10, 50, 100, 200], 'max_depth':[5, 10, 15, 20, 25], 'balance_classes':[True, False]}

search_criteria = {
    "strategy": "RandomDiscrete",
    "max_runtime_secs": 600,
    "max_models": 30,
    "stopping_metric": 'AUTO',
    "stopping_tolerance": 0.0001,
    'seed': 42
}

grid_search = H2OGridSearch(H2ORandomForestEstimator, hyper_parameters, search_criteria=search_criteria)
grid_search.train(x=events_names_x, 
                  y="total_rsvps", 
                  training_frame=train,
                  validation_frame=test)

运行后,我想打印模型并按 AUC 的顺序进行预测,

grid_search.sort_by('auc', False)

我得到以下错误,

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-272-b250bf2b838e> in <module>()
----> 1 grid_search.sort_by('auc', False)

/Users/stereo/.pyenv/versions/3.5.2/lib/python3.5/site-packages/h2o/grid/grid_search.py in sort_by(self, metric, increasing)
    663 
    664         if metric[-1] != ')': metric += '()'
--> 665         c_values = [list(x) for x in zip(*sorted(eval('self.' + metric + '.items()'), key=lambda k_v: k_v[1]))]
    666         c_values.insert(1, [self.get_hyperparams(model_id, display=False) for model_id in c_values[0]])
    667         if not increasing:

/Users/stereo/.pyenv/versions/3.5.2/lib/python3.5/site-packages/h2o/grid/grid_search.py in <module>()

/Users/stereo/.pyenv/versions/3.5.2/lib/python3.5/site-packages/h2o/grid/grid_search.py in auc(self, train, valid, xval)
    606         :return: The AUC.
    607         """
--> 608         return {model.model_id: model.auc(train, valid, xval) for model in self.models}
    609 
    610     def aic(self, train=False, valid=False, xval=False):

/Users/stereo/.pyenv/versions/3.5.2/lib/python3.5/site-packages/h2o/grid/grid_search.py in <dictcomp>(.0)
    606         :return: The AUC.
    607         """
--> 608         return {model.model_id: model.auc(train, valid, xval) for model in self.models}
    609 
    610     def aic(self, train=False, valid=False, xval=False):

/Users/stereo/.pyenv/versions/3.5.2/lib/python3.5/site-packages/h2o/model/model_base.py in auc(self, train, valid, xval)
    669         tm = ModelBase._get_metrics(self, train, valid, xval)
    670         m = {}
--> 671         for k, v in viewitems(tm): m[k] = None if v is None else v.auc()
    672         return list(m.values())[0] if len(m) == 1 else m
    673 

/Users/stereo/.pyenv/versions/3.5.2/lib/python3.5/site-packages/h2o/model/metrics_base.py in auc(self)
    158         :return: Retrieve the AUC for this set of metrics.
    159         """
--> 160         return self._metric_json['AUC']
    161 
    162     def aic(self):

KeyError: 'AUC'

任何建议:

  • 可以按性能顺序打印模型
  • 使用具有最高 AUC 的模型进行预测

最佳答案

你需要的是

sorted_grid = grid_search.get_grid(sort_by='auc',decreasing=True) 打印(排序网格)

如果您愿意,可以将 decreasing 更改为 False

关于python - H2O Python API : retrieve best models from GridSearch,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40179875/

相关文章:

python - 在 python django 中使用名称上传的文件

r - 如何迭代训练 h2o automl 模型

java - 将数据从 Java 对象保存到 H2O Frame

r - 读取 100,000 个 .dat.gz 文件的最快方法

python - Django 1.5 与 mongodb

python - 如何在django中使用加密密码登录

python list to dict转换困惑

python - 如何在Python中创建H2O数据框?

Python h2o 框架到 np 数组 reshape

python - 使用 Vincent/Vega 添加图表标题