python - 使用 tf Estimator 和 export_savedmodel 函数导出模型

标签 python machine-learning tensorflow neural-network google-cloud-ml-engine

我正在使用基于 this tuorial 的 Tensorflow 进行深度神经网络回归器。当我尝试使用 tf.estimator export_savemodel 保存模型时我收到以下错误:

 raise ValueError('Feature {} is not in features dictionary.'.format(key))
 ValueError: Feature ad_provider is not in features dictionary.

我需要将其导出,以便部署模型以支持 Google Cloud Platform 中的预测。

这是我定义列的位置:

CSV_COLUMNS = [
"ad_provider", "device", "split_group","gold", "secret_areas",
 "scored_enemies", "tutorial_sec", "video_success"
]

FEATURES = ["ad_provider", "device", "split_group","gold", "secret_areas",
 "scored_enemies", "tutorial_sec"]

LABEL = "video_success"

ad_provider = tf.feature_column.categorical_column_with_vocabulary_list(
    "ad_provider", ["Organic","Apple Search Ads","googleadwords_int",
"Facebook Ads","website"]  )

split_group = tf.feature_column.categorical_column_with_vocabulary_list(
    "split_group", [1,2,3,4])

device = tf.feature_column.categorical_column_with_hash_bucket(
    "device", hash_bucket_size=100)


secret_areas = tf.feature_column.numeric_column("secret_areas")
gold = tf.feature_column.numeric_column("gold")
scored_enemies = tf.feature_column.numeric_column("scored_enemies")
finish_tutorial_sec = tf.feature_column.numeric_column("tutorial_sec")
video_success = tf.feature_column.numeric_column("video_success")


feature_columns = [
tf.feature_column.indicator_column(ad_provider),
tf.feature_column.embedding_column(device, dimension=8),
tf.feature_column.indicator_column(split_group),
tf.feature_column.numeric_column(key="gold"),
tf.feature_column.numeric_column(key="secret_areas"),
tf.feature_column.numeric_column(key="scored_enemies"),
tf.feature_column.numeric_column(key="tutorial_sec"),
]

之后,我创建一个函数在 JSON 字典中导出我的模型。我不确定我的服务功能是否做得很好。

def json_serving_input_fn():
  """Build the serving inputs."""
  inputs = {}
  for feat in feature_columns:
    inputs[feat.name] = tf.placeholder(shape=[None], dtype= feat.dtype if     
hasattr(feat, 'dtype') else tf.string)

features = {
  key: tf.expand_dims(tensor, -1)
  for key, tensor in inputs.items()
}
  return tf.contrib.learn.InputFnOps(features, None, inputs)

这是我的其余代码:

def main(unused_argv):

  #Normalize columns 'Gold' and 'tutorial_sec' for Traininig Set
  train_n = training_set
  train_n['gold'] = (train_n['gold'] - train_n['gold'].mean()) / (train_n['gold'].max() - train_n['gold'].min())
  train_n['tutorial_sec'] = (train_n['tutorial_sec'] - train_n['tutorial_sec'].mean()) / (train_n['tutorial_sec'].max() - train_n['tutorial_sec'].min())
  train_n['scored_enemies'] = (train_n['scored_enemies'] - train_n['scored_enemies'].mean()) / (train_n['scored_enemies'].max() - train_n['scored_enemies'].min())

  test_n = test_set
  test_n['gold'] = (test_n['gold'] - test_n['gold'].mean()) / (test_n['gold'].max() - test_n['gold'].min())
  test_n['tutorial_sec'] = (test_n['tutorial_sec'] - test_n['tutorial_sec'].mean()) / (test_n['tutorial_sec'].max() - test_n['tutorial_sec'].min())
  test_n['scored_enemies'] = (test_n['scored_enemies'] - test_n['scored_enemies'].mean()) / (test_n['scored_enemies'].max() - test_n['scored_enemies'].min())

  train_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=train_n,
    y=pd.Series(train_n[LABEL].values),
    batch_size=100,
    num_epochs=None,
    shuffle=True)

  test_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=test_n,
    y=pd.Series(test_n[LABEL].values),
    batch_size=100,
    num_epochs=1,      
   shuffle=False)


  regressor = tf.estimator.DNNRegressor(feature_columns=feature_columns,
                                      hidden_units=[40, 30, 20],
                                      model_dir="model1",
                                      optimizer='RMSProp'
                                      )


  # Train

  regressor.train(input_fn=train_input_fn, steps=5)

  regressor.export_savedmodel("test",json_serving_input_fn)

  #Evaluate loss over one epoch of test_set.
  #For each step, calls `input_fn`, which returns one batch of data.
  ev = regressor.evaluate(
    input_fn=test_input_fn)
  loss_score = ev["loss"]
  print("Loss: {0:f}".format(loss_score))
  for key in sorted(ev):
      print("%s: %s" % (key, ev[key]))


  # Print out predictions over a slice of prediction_set.
  y = regressor.predict(
    input_fn=test_input_fn)
  # Array with prediction list!
  predictions = list(p["predictions"] for p in y)

  #real = list(p["real"] for p in pd.Series(training_set[LABEL].values))
  real = test_set[LABEL].values
  diff = np.subtract(real,predictions)

  diff = np.absolute(diff)
  diff = np.mean(diff)
  print("Mean Square Error of Test Set = ",diff*diff)

最佳答案

除了您提到的问题之外,我预计您还会遇到实际的多个其他问题:

  • 您正在使用 TensorFlow 1.3 中引入的 tf.estimator.DnnRegressor。 CloudML Engine 仅正式支持 TF 1.2。
  • 您正在标准化 panda 数据框中的功能,这在服务时不会发生(除非您在客户端进行)。这会引入偏差,并且您会得到较差的预测结果。

因此,让我们从使用 tf.contrib.learn.DNNRegressor 开始,它只需要进行微小的更改:

regressor = tf.estimator.DNNRegressor(
    feature_columns=feature_columns,
    hidden_units=[40, 30, 20],
    model_dir="model1",
    optimizer='RMSProp'
)
regressor.fit(input_fn=train_input_fn, steps=5)
regressor.export_savedmodel("test",json_serving_input_fn)

注意fit而不是train

(注意:您的 json_serving_inputfn 实际上已经是为 TF 1.2 编写的,并且与 TF 1.3 不兼容。目前这很好)。

现在,您看到的错误的根本原因是列/功能 ad_provider 不在输入和功能列表中(但您确实有 ad_provider_indicator >)。这是因为您正在迭代 feature_columns 而不是原始输入列列表。解决这个问题的方法是迭代实际输入而不是特征列;但是,我们也需要知道类型(仅用几列进行简化):

CSV_COLUMNS = ["ad_provider", "gold", "video_success"] 
FEATURES = ["ad_provider", "gold"] 
TYPES = [tf.string, tf.float32] 
LABEL = "video_success" 

def json_serving_input_fn(): 
  """Build the serving inputs.""" 
  inputs = {} 
  for feat, dtype in zip(FEATURES, TYPES): 
    inputs[feat] = tf.placeholder(shape=[None], dtype=dtype) 

  features = {
    key: tf.expand_dims(tensor, -1)
    for key, tensor in inputs.items()
  }
  return tf.contrib.learn.InputFnOps(features, None, inputs)

最后,为了标准化您的数据,您可能需要在图表中执行此操作。您可以尝试使用 tf.transform ,或者,编写一个执行转换的自定义估计器,委托(delegate)实际模型实现 DNNRegressor。

关于python - 使用 tf Estimator 和 export_savedmodel 函数导出模型,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46177828/

相关文章:

python - 只允许登录用户访问 URL

python - 数组初始化中 Python 中的奇怪行为

machine-learning - 无法让简单的二元分类器工作

c# - IronPython:意外 token 'from'

python - 如何在 python 中打印对象的二维数组?

python - 打印命名元组

machine-learning - 使用无监督降维的模糊聚类

python - 决策树中特定类的 Sklearn 决策规则

machine-learning - TensorFlow 学习率衰减 - 如何正确提供衰减的步数?

javascript - tensorflowjs 加载重新训练的 coco-ssd 模型 - 在浏览器中不起作用