我有一组 300 万个向量(每个向量 300 个维度),我正在寻找这个 300 个昏暗空间中的新点,该点与所有其他点(向量)
我能做的是初始化一个随机向量 v,然后对 v 进行优化,目标是:
其中 d_xy 是向量 x 和向量 y 之间的距离,但这在计算上会非常昂贵。
我正在为这个问题寻找一个近似解向量,它可以在非常大的向量集上快速找到。 (或者任何会为我做这样的事情的图书馆——任何语言)
最佳答案
来自 this question on the Math StackExchange :
There is no point that is equidistant from 4 or more points in general position in the plane, or n+2 points in n dimensions.
Criteria for representing a collection of points by one point are considered in statistics, machine learning, and computer science. The centroid is the optimal choice in the least-squares sense, but there are many other possibilities.
The centroid is the point C in the the plane for which the sum of squared distances $\sum |CP_i|^2$ is minimum. One could also optimize a different measure of centrality, or insist that the representative be one of the points (such as a graph-theoretic center of a weighted spanning tree), or assign weights to the points in some fashion and take the centroid of those.
请注意,具体而言,“质心是最小二乘意义上的最佳选择”,因此成本函数(这是最小二乘成本)的最佳解决方案只是对点的所有坐标进行平均(这会给你质心)。
关于python - 查找与集合中所有向量的距离大致相等的向量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30777540/