python - `target` 中的 `ClassificationDataSet` 有什么用?

标签 python pybrain

我尝试找出ClassificationDataSet的参数target可以用来做什么,但我仍然不清楚。

我尝试过的

>>> from pybrain.datasets import ClassificationDataSet
>>> help(ClassificationDataSet)
Help on class ClassificationDataSet in module pybrain.datasets.classification:

class ClassificationDataSet(pybrain.datasets.supervised.SupervisedDataSet)
 |  Specialized data set for classification data. Classes are to be numbered from 0 to nb_classes-1.
 |  
 |  Method resolution order:
 |      ClassificationDataSet
 |      pybrain.datasets.supervised.SupervisedDataSet
 |      pybrain.datasets.dataset.DataSet
 |      pybrain.utilities.Serializable
 |      __builtin__.object
 |  
 |  Methods defined here:
 |  
 |  __add__(self, other)
 |      Adds the patterns of two datasets, if dimensions and type match.
 |  
 |  __init__(self, inp, target=1, nb_classes=0, class_labels=None)
 |      Initialize an empty dataset. 
 |      
 |      `inp` is used to specify the dimensionality of the input. While the 
 |      number of targets is given by implicitly by the training samples, it can
 |      also be set explicity by `nb_classes`. To give the classes names, supply
 |      an iterable of strings as `class_labels`.
 |  
 |  __reduce__(self)

由于这不包含有关目标的信息(除了默认为 1),我查看了 source code of ClassificationDataSet :

class ClassificationDataSet(SupervisedDataSet):
    """ Specialized data set for classification data. Classes are to be numbered from 0 to nb_classes-1. """

    def __init__(self, inp, target=1, nb_classes=0, class_labels=None):
        """Initialize an empty dataset.

        `inp` is used to specify the dimensionality of the input. While the
        number of targets is given by implicitly by the training samples, it can
        also be set explicity by `nb_classes`. To give the classes names, supply
        an iterable of strings as `class_labels`."""
        # FIXME: hard to keep nClasses synchronized if appendLinked() etc. is used.
        SupervisedDataSet.__init__(self, inp, target)
        self.addField('class', 1)
        self.nClasses = nb_classes
        if len(self) > 0:
            # calculate class histogram, if we already have data
            self.calculateStatistics()
        self.convertField('target', int)
        if class_labels is None:
            self.class_labels = list(set(self.getField('target').flatten()))
        else:
            self.class_labels = class_labels
        # copy classes (may be changed into other representation)
        self.setField('class', self.getField('target'))

还是不清楚,所以我看了SupervisedDataSet :

class SupervisedDataSet(DataSet):
    """SupervisedDataSets have two fields, one for input and one for the target.
    """

    def __init__(self, inp, target):
        """Initialize an empty supervised dataset.

        Pass `inp` and `target` to specify the dimensions of the input and
        target vectors."""
        DataSet.__init__(self)
        if isscalar(inp):
            # add input and target fields and link them
            self.addField('input', inp)
            self.addField('target', target)
        else:
            self.setField('input', inp)
            self.setField('target', target)

        self.linkFields(['input', 'target'])

        # reset the index marker
        self.index = 0

        # the input and target dimensions
        self.indim = self.getDimension('input')
        self.outdim = self.getDimension('target')

好像是关于输出维度的。但 target 不应该是 nb_classes 吗?

最佳答案

target 参数是训练样本输出维度的维度。要完全理解它和 nb_classes 之间的区别,让我们看看 _convertToOneOfMany 方法:

def _convertToOneOfMany(self, bounds=(0, 1)):
    """Converts the target classes to a 1-of-k representation, retaining the
    old targets as a field `class`.

    To supply specific bounds, set the `bounds` parameter, which consists of
    target values for non-membership and membership."""
    if self.outdim != 1:
        # we already have the correct representation (hopefully...)
        return
    if self.nClasses <= 0:
        self.calculateStatistics()
    oldtarg = self.getField('target')
    newtarg = zeros([len(self), self.nClasses], dtype='Int32') + bounds[0]
    for i in range(len(self)):
        newtarg[i, int(oldtarg[i])] = bounds[1]
    self.setField('target', newtarg)
    self.setField('class', oldtarg)

因此从理论上讲,target 是输出的维度,而 nb_classes 是分类类别的数量。这对于数据转换很有用。 例如,假设我们有用于在 xor 函数中训练网络的数据,如下所示:

 IN   OUT
[0,0],0
[0,1],1
[1,0],1
[1,1],0

因此输出的维度等于 1,但有两个输出类别:0 和 1。 所以我们可以将数据更改为:

 IN    OUT
[0,0],(0,1)
[0,1],(1,0)
[1,0],(1,0)
[1,1],(0,1)

现在输出的第一个参数是 True 的值,第二个参数是 False 的值。 这是更多类(class)的常见做法,例如手写识别。

希望能为您清除这一点。

关于python - `target` 中的 `ClassificationDataSet` 有什么用?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24231157/

相关文章:

python - 运行神经网络pybrain

python - 在 Pybrain 中创建共享权重连接

python - 平均 Pandas 数据框中的单元格 block

Python - 从字符串创建矩阵

python - docs.python.org 是否使用 sphinx.ext.autodoc?

Python 3 : Sympy: Include list information to optimize lambdify

python - 如何使用 Pandas 创建仅包含结果的 Excel 文件?

python - PyBrain 中 trainer.train() 的错误输出指的是什么?

python - 制作正确的 ANN 进行预测