python - tensorflow 训练感知器中的 nan 成本

标签 python tensorflow

我正在尝试在 tensorflow 中的以下数据文件上训练单层感知器(基于我的代码 this):

1,1,0.05,-1.05
1,1,0.1,-1.1
....

最后一列是标签(3 个参数的函数),前三列是函数参数。读取数据和训练模型的代码(为了可读性我简化了它):

import tensorflow as tf

... # some basics to read the data
example, label = read_file_format(filename_queue)
... # model construction and parameter setting
n_hidden_1 = 4 # 1st layer number of features
n_input = 3
n_output = 1
...

# calls a function which produces a prediction
pred = multilayer_perceptron(x, weights, biases)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

# Initializing the variables
init = tf.initialize_all_variables()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    for epoch in range(training_epochs):
        _, c = sess.run([optimizer, cost], feed_dict={x: example.reshape(1,3), y: label.reshape(-1,1)})
        # Display logs per epoch step
        if epoch % display_step == 0:
            print("Epoch:", '%04d' % (epoch+1), "Cost:",c)

但是当我运行它时,似乎有些地方不对劲:

('Epoch:', '0001', 'Cost:', nan)
('Epoch:', '0002', 'Cost:', nan)
....
('Epoch:', '0015', 'Cost:', nan)

这是 multilaye_perceptron 函数等的完整代码:

# Parameters
learning_rate = 0.001
training_epochs = 15
display_step = 1

# Network Parameters
n_hidden_1 = 4 # 1st layer number of features
n_input = 3 
n_output = 1 

# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_output])

# Create model
def multilayer_perceptron(x, weights, biases):
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    layer_1 = tf.nn.relu(layer_1)
    # Output layer with linear activation
    out_layer = tf.matmul(layer_1, weights['out']) + biases['out']
    return out_layer

# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
    'out': tf.Variable(tf.random_normal([n_hidden_1, n_output]))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'out': tf.Variable(tf.random_normal([n_output]))
}

最佳答案

这是一次一个例子吗?我会批量处理并将批量大小增加到 128 或类似大小,只要你得到 nans。

当我得到 nans 时,它通常是以下三个之一: - 批量太小(在你的情况下只有 1) - 在某处记录(0) - 学习率太高且梯度不封顶

关于python - tensorflow 训练感知器中的 nan 成本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39470802/

相关文章:

python - QCoreApplication.processEvents 行为

python - 带有django的动态模板 "Includes"

python - 搜索图像的颜色。返回 X、Y

model - Tensorflow Slim 恢复模型并预测

python - 如何在 TensorFlow 中有效地分配给张量的切片

tensorflow - TFRecords/TensorFlow 服务 : Converting TFRecords into (GRPC or RESTFul) TensorFlow Serving requests?

optimization - Tensorflow 的超参数调整

python - 当文本内容包含前导和尾随空白字符时如何单击按钮?

python - 由递归辅助函数产生

python - 执行 "import tensorflow.keras.utils.np_utils"时出错