matlab - 使用 VLFeat 创建集群后,将描述符分配给集群中心

标签 matlab k-means nearest-neighbor vlfeat

我正在使用 k-means 对我的数据进行聚类,但我没有使用标准算法,我正在使用近似最近邻 (ANN) 算法来加速样本到中心的比较。这可以通过以下方式轻松完成:

[clusterCenters, trainAssignments] = vl_kmeans(trainDescriptors, clusterCount, 'Algorithm', 'ANN', 'MaxNumComparisons', ceil(clusterCount / 50));

现在,当我运行此代码时,变量“trainDescriptors”被聚类,每个描述符都使用 ANN 分配给“clusterCenters”。

我还有另一个变量,“testDescriptors”。我也想将它们分配给聚类中心。并且此分配必须使用与“trainDescriptors”相同的方法完成,但 AFAIK vl_kmeans 函数不会返回它为快速构建的树作业。

所以,我的问题是,是否可以将“testDescriptors”分配给“clustersCenters”,就像将“trainDescriptors”分配给“<vl_kmeans 函数中的 em>clusterCenters',如果是,我该怎么做?

最佳答案

嗯,我已经弄明白了。可以像下面这样完成:

clusterCount = 1024;
datasetTrain = single(rand(128, 100000)); 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 1 - cluster train data and get train assignments
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

[clusterCenters, trainAssignments_actual] = vl_kmeans(datasetTrain, clusterCount, ...
    'Algorithm', 'ANN', ...
    'Distance', 'l2', ...
    'NumRepetitions', 1, ...
    'NumTrees', 3, ...
    'MaxNumComparisons', ceil(clusterCount / 50) ...
);

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 2 - assign train data to clusters centers
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

forest = vl_kdtreebuild(clusterCenters, ...
    'Distance', 'l2', ...
    'NumTrees', 3 ...
);

trainAssignments_expected = vl_kdtreequery(forest, clusterCenters, datasetTrain);

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 3 - validate second assignment
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

validation = isequal(trainAssignments_actual, trainAssignments_expected);

在第 2 步中,我使用聚类中心创建一棵新树,然后再次将数据分配给中心。它给出了有效的结果。

关于matlab - 使用 VLFeat 创建集群后,将描述符分配给集群中心,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24105949/

相关文章:

matlab - 为什么(:) with no values behave differently?

algorithm - 使用 SPARQL 查询查找 k 最近邻

python - 如何在非常大的 torch 张量上执行操作而不拆分它们

matlab - 使用 Matlab/Octave 向量化两个矩阵对应列的外积之和

matlab - MATLAB 中的动画

scala - Spark::KMeans 调用 takeSample() 两次?

python - 使用python对词袋模型进行简单的k均值聚类

python-2.7 - 使用 k 均值进行图像分割

algorithm - K-d 树 : nearest neighbor search algorithm

python - 如何通过 Python 的 scipy.spatial.Voronoi 获得与 MATLAB 的 voronoin 相同的输出