tensorflow - 将 tf.contrib.learn 输入输入 DNNClassifier

标签 tensorflow google-cloud-datalab

我是 tensorflow 和 stackoverflow 的新手,因此对于任何愚蠢的错误提前表示歉意。我在提供较低级别的接口(interface)方面取得了很好的成功。所以我决定尝试一下 tf.contrib.learn 更高级别的 API,因为它看起来很简单。我在 Google Cloud Datalab(Jupyter 笔记本)工作,但遇到了障碍,正在寻求帮助。

主要问题:如何实例化 DNNClassifier 以便我可以为其提供一个本身就是 tf.float32 列表的功能数字?

这是详细信息。我正在使用以下代码读取基于 TFRecords 的输入文件:

def read_and_decode(filename_queue):  
    # get a tensorflow reader and read in an example
    reader = tf.TFRecordReader()
    _, serialized_example = reader.read(filename_queue)

    # parse a single example
    features = tf.parse_single_example(serialized_example, features={ 
               'label': tf.FixedLenFeature([], tf.int64),
               'features': tf.FixedLenFeature([], tf.string)} )

    # convert to tensors and return
    bag_of_words = tf.decode_raw(features['features'], tf.float32)
    bag_of_words.set_shape([LEN_OF_LEXICON])
    label = tf.cast(features['label'], tf.int32) 

    return bag_of_words, label

我的单元测试如下所示:

# unit test
filename = VALIDATION_FILE
my_filename_queue = tf.train.string_input_producer([filename], 
num_epochs=1)
x, y = read_and_decode(my_filename_queue)
print ('x[0] -> ', x[0])
print ('x[1] -> ', x[1])
print ('y -> ', y, 'type -> ', type(y))
print ('x -> ', x, 'type -> ', type(x))

并给出以下输出:

x[0] ->  Tensor("strided_slice_6:0", shape=(), dtype=float32)
x[1] ->  Tensor("strided_slice_7:0", shape=(), dtype=float32)
y ->  Tensor("Cast_6:0", shape=(), dtype=int32) type ->  <class 
'tensorflow.python.framework.ops.Tensor'>
x ->  Tensor("DecodeRaw_3:0", shape=(2633,), dtype=float32) type ->
<class 'tensorflow.python.framework.ops.Tensor'>

read_and_decode 函数由 input_pipeline 调用,它具有以下定义和单元测试:

def input_pipeline(filenames, batch_size, num_epochs=None):

    filename_queue = tf.train.string_input_producer(filenames, 
               num_epochs=num_epochs, shuffle=True)

    example, label = read_and_decode(filename_queue)

    min_after_dequeue = 10000
    capacity = min_after_dequeue + 3 * batch_size
    example_batch, label_batch = tf.train.shuffle_batch([example, 
           label], batch_size=batch_size, capacity=capacity, 
           min_after_dequeue=min_after_dequeue) 

    return example_batch, label_batch

# unit test
x, y = input_pipeline([VALIDATION_FILE], BATCH_SIZE, num_epochs=1)
print ('y -> ', y, 'type -> ', type(y))
print ('x -> ', x, 'type -> ', type(x))

并具有以下输出:

y ->  Tensor("shuffle_batch_4:1", shape=(100,), dtype=int32) type ->
<class 'tensorflow.python.framework.ops.Tensor'>
x ->  Tensor("shuffle_batch_4:0", shape=(100, 2633), dtype=float32) 
type -> <class 'tensorflow.python.framework.ops.Tensor'>

接受这些提要的训练器如下所示:

def run_training():
    #feature_columns = ????????????
    feature_columns = tf.contrib.layers.real_valued_column("", 
             dimension=LEN_OF_LEXICON, dtype=tf.float32)
    estimator = tf.contrib.learn.DNNClassifier(
                       feature_columns=feature_columns,
                       n_classes=5,
                       hidden_units=[1024, 512, 256], 
                       optimizer = 
                  tf.train.ProximalAdagradOptimizer(learning_rate=0.1, 
                             l1_regularization_strength=0.001) )

     estimator.fit(input_fn=lambda: input_pipeline([VALIDATION_FILE], 
            BATCH_SIZE, num_epochs=1))

# unit test
run_training()

DNNClassifier 的实例化顺利进行,但对 estimator.fit() 的调用会引发异常(在下面的代码片段下方回溯)。我的 input_pipeline 正在提供 tensorflow 文档中所示的提要,但不知何故,张量内的数据形式似乎不正确。大家对此有什么想法吗?

---------------- Traceback Snippet -----------------
> `/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/dnn.pyc in _dnn_model_fn(features, labels, mode, params, config)
    126         feature_columns=feature_columns,
    127         weight_collections=[parent_scope],
--> 128         scope=scope)
    129 
    130   hidden_layer_partitioner = (
/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/feature_column_ops.pyc in input_from_feature_columns(columns_to_tensors, feature_columns, weight_collections, trainable, scope)
    247                                      scope,
    248                                      output_rank=2,
--> 249                                      default_name='input_from_feature_columns')
    250 
    251 
/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/feature_column_ops.pyc in _input_from_feature_columns(columns_to_tensors, feature_columns, weight_collections, trainable, scope, output_rank, default_name)
    145                                 default_name):
    146   """Implementation of `input_from(_sequence)_feature_columns`."""
--> 147   check_feature_columns(feature_columns)
    148   with variable_scope.variable_scope(scope,
    149                                      default_name=default_name,
/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/feature_column_ops.pyc in check_feature_columns(feature_columns)
    806   seen_keys = set()
    807   for f in feature_columns:
--> 808     key = f.key
    809     if key in seen_keys:
    810       raise ValueError('Duplicate feature column key found for column: {}. '
AttributeError: 'str' object has no attribute 'key'
`

最佳答案

解决方案是使用函数:

feature_columns = tf.contrib.learn.infer_real_valued_columns_from_input_fn(lambda: input_pipeline([INPUT_FILE], BATCH_SIZE, num_epochs=1))

它从 input_fn 的输出签名推断列。简单易行!

关于tensorflow - 将 tf.contrib.learn 输入输入 DNNClassifier,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43684973/

相关文章:

google-cloud-platform - 将 python PIL 图像从 Google 云数据实验室保存到 Google 云存储

google-cloud-platform - 如何配置 Google Cloud Datalab 以将 GPU 用于 TensorFlow?

python - 运行位于 GCS 中的 .PY 脚本

python - 如何在 Python 中安装 gcp?

tensorflow - 了解openAI 5的模型(1024单元LSTM强化学习)

python - 在 C++ 中索引 tensorflow 输出张量

Tensorflow 版本与 Tensorboard 版本

python - 卷积层特征图上的特殊函数

python - TensorFlow 中的变量到底是什么?我们为什么要使用它们?

python - Datalab 到 BigQuery - 将变量值插入 SQL 中