python - tensorflow rnn nan 错误

标签 python tensorflow lstm

我想训练一个 RNN 模型来连接文章和图像。输入和输出是两个数组。

我定义RNN的参数如下:

learning_rate = 0.001
training_iters = 100000
batch_size = 128
display_step = 10

# Network Parameters
n_input = 128 
n_steps = 168 # timesteps
n_hidden = 512 # hidden layer num of features
output = 200 

图片为128*168,文章为200

cost = tf.reduce_mean(pow(pred-y,2)/2) 
#cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

对于最终结果,我想训练一个网络将图像转换为文章。但是,当我尝试训练模型时,成本返回为 NaN。

这是代码:

# coding=utf-8
from __future__ import print_function
from tensorflow.contrib import rnn
import scipy.io as scio
import tensorflow as tf
import numpy as np
import os
TextPath = 'F://matlab_code//readtxt//ImageTextVector.mat';
ImageDirPath = 'F://matlab_code//CVPR10-LLC//features//1';
Text = scio.loadmat(TextPath)

learning_rate = 0.001
training_iters = 100000
batch_size = 128
display_step = 10

# Network Parameters
n_input = 128 # 
n_steps = 168 # timesteps
n_hidden = 512 # hidden layer num of features
output = 200 # 

x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, output])

weights = {
    'out': tf.Variable(tf.random_normal([n_hidden, output]))
}
biases = {
    'out': tf.Variable(tf.random_normal([output]))
}

def RNN(x, weights, biases):

    lstm_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)

    outputs, states = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)

    return tf.matmul(outputs[-1], weights['out']) + biases['out']

pred = RNN(x, weights, biases)

# Define loss and optimizer
cost = tf.reduce_mean(pow(pred-y,2)/2) 
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

init = tf.global_variables_initializer()

train_count=0;
with tf.Session() as sess:
    sess.run(init)
    step = 0
    while step* batch_size < training_iters:
        iter = step*batch_size
        batch_x = []
        batch_y = []
        while iter < (step+1)*batch_size:
            ImagePath = ImageDirPath + '//' + Text['X'][train_count][0][0] +'.mat'
            if os.path.exists(ImagePath):
                batch_xx=[]
                batch_yy=[]
                Image = scio.loadmat(ImagePath)
                i=0
                while i<21504 :
                    batch_xx.append(Image['fea'][i][0])
                    i=i+1
                batch_yy = Text['X'][train_count][1][0]
                batch_xx = np.array(batch_xx)
                batch_x=np.hstack((batch_x,batch_xx))
                batch_y=np.hstack((batch_y,batch_yy))
                iter = iter+1
            train_count=train_count+1
        batch_x = batch_x.reshape((batch_size,n_steps, n_input))
        batch_y = batch_y.reshape((batch_size,output))
        # Run optimization op (backprop)
        sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})

        if step % display_step == 0:
            # Calculate batch loss
            loss = sess.run(cost, feed_dict={x: batch_x, y: batch_y})
            print("Iter " + str(step* batch_size) + ", Minibatch Loss= " + \
                 "{:.6f}".format(loss) )
        step += 1
    print("Optimization Finished!")

最佳答案

当您将包含 nan 值的张量传递给 lstm 时,lstm 单元格中的值将被“强制”为 nan,因为 number 和 之间的数值运算>南。检查您的数据是否具有 nan 值,或者仅使用 numpy.nan_to_num 填充您的 nan 数据。

关于python - tensorflow rnn nan 错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43714594/

相关文章:

python - 如果 pandas Dataframe 列值与单词匹配,则将其替换为列表

python - 在单独的线程中从 Tensorflow 模型返回层激活和权重

python - 在 Tensorflow 中使用迭代器时如何正确设置 is_training

python - 处理 RNN/LSTM 中的缺失数据(时间序列)

deep-learning - 对 PyTorch RNN/LSTM 使用 SHAP 值

python - python 如何将函数解释为生成器

php - Python 相当于 PHPs __call() 魔术方法?

python - 在Python中使用对象作为键时索引哈希表

machine-learning - 如何在 TensorFlow 中执行可微分运算选择?

r - mxnet 中 LSTM RNN 出现错误(R 环境)