python - 来自 **sklearn** 的 TSNE 和 **mahalanobis** 指标

标签 python python-3.x scikit-learn

使用 sklearn 的 TSNE 和 ma​​halanobis 指标,我得到以下错误

from sklearn.manifold import TSNE      
tsne = TSNE( verbose=1, perplexity=40, n_iter=250,learning_rate=50, random_state=0,metric='mahalanobis')
pt=data.sample(frac=0.1).values
tsne_results = tsne.fit_transform(pt)

ValueError:必须为马氏距离提供 V 或 VI

如何为马氏距离提供 method_parameters?

最佳答案

确实没有像在其他情况下那样定义 metric_params 的选项。例如,其他基于成对距离的类提供了一个 metric_params 参数来将额外的参数传递给距离函数。喜欢

有这个:

metric_params : dict, optional (default = None)

    Additional keyword arguments for the metric function.

answer here显示如何使用此参数。

但是 TSNE 没有办法发送额外的参数。所以现在,您需要扩展类并覆盖 __init__() 以发送参数,然后 _fit() method实际使用它们。

我们可以这样做:

from time import time
import numpy as np
import scipy.sparse as sp
from sklearn.manifold import TSNE
from sklearn.externals.six import string_types
from sklearn.utils import check_array, check_random_state
from sklearn.metrics.pairwise import pairwise_distances
from sklearn.manifold.t_sne import _joint_probabilities, _joint_probabilities_nn
from sklearn.neighbors import NearestNeighbors
from sklearn.decomposition import PCA

class MyTSNE(TSNE):
    def __init__(self, n_components=2, perplexity=30.0,
                 early_exaggeration=12.0, learning_rate=200.0, n_iter=1000,
                 n_iter_without_progress=300, min_grad_norm=1e-7,
                 metric="euclidean", metric_params=None, #<=ADDED
                 init="random", verbose=0,
                 random_state=None, method='barnes_hut', angle=0.5):
        self.n_components = n_components
        self.perplexity = perplexity
        self.early_exaggeration = early_exaggeration
        self.learning_rate = learning_rate
        self.n_iter = n_iter
        self.n_iter_without_progress = n_iter_without_progress
        self.min_grad_norm = min_grad_norm
        self.metric = metric
        self.metric_params = metric_params  #<=ADDED
        self.init = init
        self.verbose = verbose
        self.random_state = random_state
        self.method = method
        self.angle = angle

    def _fit(self, X, skip_num_points=0):
        if self.method not in ['barnes_hut', 'exact']:
            raise ValueError("'method' must be 'barnes_hut' or 'exact'")
        if self.angle < 0.0 or self.angle > 1.0:
            raise ValueError("'angle' must be between 0.0 - 1.0")
        if self.metric == "precomputed":
            if isinstance(self.init, string_types) and self.init == 'pca':
                raise ValueError("The parameter init=\"pca\" cannot be "
                                 "used with metric=\"precomputed\".")
            if X.shape[0] != X.shape[1]:
                raise ValueError("X should be a square distance matrix")
            if np.any(X < 0):
                raise ValueError("All distances should be positive, the "
                                 "precomputed distances given as X is not "
                                 "correct")
        if self.method == 'barnes_hut' and sp.issparse(X):
            raise TypeError('A sparse matrix was passed, but dense '
                            'data is required for method="barnes_hut". Use '
                            'X.toarray() to convert to a dense numpy array if '
                            'the array is small enough for it to fit in '
                            'memory. Otherwise consider dimensionality '
                            'reduction techniques (e.g. TruncatedSVD)')
        else:
            X = check_array(X, accept_sparse=['csr', 'csc', 'coo'],
                            dtype=[np.float32, np.float64])
        if self.method == 'barnes_hut' and self.n_components > 3:
            raise ValueError("'n_components' should be inferior to 4 for the "
                             "barnes_hut algorithm as it relies on "
                             "quad-tree or oct-tree.")
        random_state = check_random_state(self.random_state)

        if self.early_exaggeration < 1.0:
            raise ValueError("early_exaggeration must be at least 1, but is {}"
                             .format(self.early_exaggeration))

        if self.n_iter < 250:
            raise ValueError("n_iter should be at least 250")

        n_samples = X.shape[0]

        neighbors_nn = None
        if self.method == "exact":
            if self.metric == "precomputed":
                distances = X
            else:
                if self.verbose:
                    print("[t-SNE] Computing pairwise distances...")

                if self.metric == "euclidean":
                    distances = pairwise_distances(X, metric=self.metric,
                                                   squared=True,
                                                   **self.metric_params) #<=ADDED
                else:
                    distances = pairwise_distances(X, metric=self.metric,
                                                   **self.metric_params) #<=ADDED

                if np.any(distances < 0):
                    raise ValueError("All distances should be positive, the "
                                     "metric given is not correct")

            P = _joint_probabilities(distances, self.perplexity, self.verbose)
            assert np.all(np.isfinite(P)), "All probabilities should be finite"
            assert np.all(P >= 0), "All probabilities should be non-negative"
            assert np.all(P <= 1), ("All probabilities should be less "
                                    "or then equal to one")

        else:
            k = min(n_samples - 1, int(3. * self.perplexity + 1))

            if self.verbose:
                print("[t-SNE] Computing {} nearest neighbors...".format(k))

            knn = NearestNeighbors(algorithm='auto', n_neighbors=k,
                                   metric=self.metric, 
                                   metric_params = self.metric_params) #<=ADDED
            t0 = time()
            knn.fit(X)
            duration = time() - t0
            if self.verbose:
                print("[t-SNE] Indexed {} samples in {:.3f}s...".format(
                    n_samples, duration))

            t0 = time()
            distances_nn, neighbors_nn = knn.kneighbors(
                None, n_neighbors=k)
            duration = time() - t0
            if self.verbose:
                print("[t-SNE] Computed neighbors for {} samples in {:.3f}s..."
                      .format(n_samples, duration))

            del knn

            if self.metric == "euclidean":
                distances_nn **= 2

            P = _joint_probabilities_nn(distances_nn, neighbors_nn,
                                        self.perplexity, self.verbose)

        if isinstance(self.init, np.ndarray):
            X_embedded = self.init
        elif self.init == 'pca':
            pca = PCA(n_components=self.n_components, svd_solver='randomized',
                      random_state=random_state)
            X_embedded = pca.fit_transform(X).astype(np.float32, copy=False)
        elif self.init == 'random':
            X_embedded = 1e-4 * random_state.randn(
                n_samples, self.n_components).astype(np.float32)
        else:
            raise ValueError("'init' must be 'pca', 'random', or "
                             "a numpy array")

        degrees_of_freedom = max(self.n_components - 1.0, 1)

        return self._tsne(P, degrees_of_freedom, n_samples,
                          X_embedded=X_embedded,
                          neighbors=neighbors_nn,
                          skip_num_points=skip_num_points)

我已在更改上标记 (#<=ADDED)。现在尝试使用此类而不是像这样的 TSNE:

tsne = MyTSNE(verbose=1,perplexity=40,n_iter=250,learning_rate=50, random_state=0,
              metric='mahalanobis', metric_params={'V': np.cov(X)})

pt=data.sample(frac=0.1).values
tsne_results = tsne.fit_transform(pt)

注意: 我在顶部提到的其他类检查 metric_params 的有效参数,但我没有这样做,所以请确保您在其中传递正确的参数,否则会出错。

您应该在 scikit-learn issues page on github 上发布问题

关于python - 来自 **sklearn** 的 TSNE 和 **mahalanobis** 指标,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51766963/

相关文章:

python - 使用 scikit Pipeline 测试模型但仅预处理一次数据

python - 使用 Sklearn 的组/集群 K-Fold CV

python - pandas 阻止我下载我不想拥有的文件

python - 在centos 6.x 上使用python3 安装twis​​ted 失败?

python - 将计算添加到 lambda 或函数

python-3.x - 用 Pandas 从同一列的另一行中减去行

python - 使用以前保存的模型获得测试数据的分类准确性

python - 如何在 Django 中实现依赖下拉列表

python - 你如何使用 argparse 获取程序的名称?

python - 如何在列表中嵌入一个类?