python - Tensorflow 无效参数 : Assertation Failed [Label IDs must < n_classes]

标签 python tensorflow

我在使用 Python 2.7 的 Tensorflow 1.3.0 中实现 DNNClassifier 时遇到错误。我从 Tensorflow tf.estimator Quickstart 教程中获得了示例代码,我想使用我自己的数据集运行它:3D 坐标和 10 个不同的类(int 标签)。这是我的实现:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

def ReadLabels(file):
    #load the labels from test file here
    labelFile = open(file, "r")
    Label = labelFile.readlines();
    returnL = [[Label[i][j+1] for j in range(len(Label[0])-3)] for i in range(len(Label))]
    returnLint = list();
    for i in range(len(returnL)):
        tmp = ''
        for j in range(len(returnL[0])):
            tmp += str(returnL[i][j])
        returnLint.append(int(tmp))
    return returnL, returnLint

def NumpyReadBin(file,numcols,type):
    #load the data from binary file here
    import numpy as np
    trainData = np.fromfile(file,dtype=type)
    numrows = len(trainData)/numcols
    #print trainData[0:100]
    result = [[trainData[i+j*numcols] for i in range(numcols)] for j in range(numrows)]
    return result

def TensorflowDNN():
    #load sample dataset
    trainData = NumpyReadBin('data/TrainingData.dat',3,'float32')
    valData = NumpyReadBin('data/ValidationData.dat',3,'float32')
    testData = NumpyReadBin('data/TestingData.dat',3,'float32')
    #load sample labels
    trainL, trainLint = ReadLabels('data/TrainingLabels.txt')
    validateL, validateLint = ReadLabels('data/ValidationLabels.txt')
    testL, testLint = ReadLabels('data/TestingLabels.txt')

    import tensorflow as tf
    import numpy as np

    #get unique labels
    uniqueTrain = set()
    for l in trainLint:
        uniqueTrain.add(l)
    uniqueTrain = list(uniqueTrain)
    numClasses = len(uniqueTrain)
    numDims = len(trainData[0])

    #All features have real-value data
    feature_columns = [tf.feature_column.numeric_column("x", shape=[3])]

    # Build 3 layer DNN with 10, 20, 10 units respectively.
    classifier = tf.estimator.DNNClassifier(feature_columns=feature_columns,
                                              hidden_units=[10, 20, 10],
                                              n_classes=numClasses,
                                              model_dir="../Classification/tmp")

    # Define training inputs
    train_input_fn = tf.estimator.inputs.numpy_input_fn(
                                                x={"x": np.array(trainData)},y=np.array(trainLint),
                                                num_epochs = None, shuffle = True)

    #Train the model
    classifier.train(input_fn = train_input_fn, steps = 2000)

    #Define Validation inputs
    val_input_fn = tf.estimator.inputs.numpy_input_fn(
                                                x={"x": np.array(valData)},y=np.array(validateLint),
                                                num_epochs = 1, shuffle = False)

    # Evaluate accuracy.
    accuracy_score = classifier.evaluate(input_fn=val_input_fn)["accuracy"]
    print("\nTest Accuracy: {0:f}\n".format(accuracy_score))

if __name__ == '__main__':
    TensorflowDNN()

函数 RedLabels(...)NumpyReadBin(...) 正在加载我保存的张量数据集。由于标签是我从文本文件中读取的整数,该函数有点奇怪,但我最终得到的是一个包含来自这些标签的整数的数组:[11、12、21、22、23、31、32 , 33, 41, 42].

但是我无法对任何内容进行分类,因为在调用 classifier.train(input_fn = train_input_fn, steps = 2000) 时,我收到以下错误:

...Traceback and stuff like that...
InvalidArgumentError (see above for traceback): assertion failed: [Label IDs must < n_classes] [Condition x < y did not hold element-wise:x (dnn/head/labels:0) = ] [[21][32][42]...] [y (dnn/head/assert_range/Const:0) = ] [10]
[[Node: dnn/head/assert_range/assert_less/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](dnn/head/assert_range/assert_less/Assert/AssertGuard/Assert/Switch/_117, dnn/head/assert_range/assert_less/Assert/AssertGuard/Assert/data_0, dnn/head/assert_range/assert_less/Assert/AssertGuard/Assert/data_1, dnn/head/assert_range/assert_less/Assert/AssertGuard/Assert/Switch_1/_119, dnn/head/assert_range/assert_less/Assert/AssertGuard/Assert/data_3, dnn/head/assert_range/assert_less/Assert/AssertGuard/Assert/Switch_2/_121)]]

有没有人以前遇到过这个错误或者知道如何解决它?我猜这是在提示我的数据集中的类/标签格式的数量,但我知道 trainLint 包含 10 个不同的类标签,这就是 numClasses 的值。会不会是我的 trainLint 数组的格式?

最佳答案

所以解决方案为Ishant Mrinal指出:

Tensorflow 期望从 0 到类数的整数作为类标签 (range(0, num_classes)),而不是像我的情况那样的“任意”数字。谢谢!:)

...我刚刚遇到的另一个选择是将 label_vocabulary 添加到分类器定义中:

classifier = tf.estimator.DNNClassifier(feature_columns=feature_columns,
                                          hidden_units=[10, 20, 10],
                                          n_classes=numClasses,
                                          model_dir=saveAt,
                                          label_vocabulary=uniqueTrain)

使用此选项,我可以像以前一样定义标签,并将其转换为字符串。

关于python - Tensorflow 无效参数 : Assertation Failed [Label IDs must < n_classes],我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45813746/

相关文章:

python - Tensorflow 自定义过滤层定义,如 glcm 或 gabor

image-processing - 卷积神经网络中深度的解读

python - 如何正确使用tensorflow_probability从随机变量函数中采样?

python - Selenium 线程 : how to run multi-threaded browser with proxy ( python)

python - 直播 RTSP 到 html 所有浏览器

Python:编写相当复杂的代码作为列表理解

tensorflow - TFLearn 无法正确加载具有形状的训练数据

python - tf.Data.Dataset - 在每个 Epoch 上,仅使用完整数据集的子样本进行训练

Python Jinja2 模板渲染字符串

python - Django 和 fcgi - 日志记录问题