python - 不确定 sklearn 中的 PCA

我需要使用 sklearn 进行一些 PCA，并且我想确保我以正确的方式进行。这是我的代码:

from sklearn.decomposition import PCA
pca = PCA(n_components=5)
pca_result = pca.fit_transform(data)

eigenvalues = pca.singular_values_
print(eigenvalues)

x = pca_result[:,0]
y = pca_result[:,1]

数据如下所示:

[[ -6.4186, -14.3534,  18.1296,  -2.8110,  14.0298],
[ -7.1220, -17.1501,  21.2807,  -3.5025,  16.4489],
[ -8.4652, -18.9316,  25.0303,  -4.1773,  18.5066],
...,
[ -4.7054,   6.1389,   3.5146,  -0.1036,  -0.7332],
[ -5.8533,   9.9087,   4.1178,  -0.5211,  -2.2415],
[ -6.2969,  13.8951,   3.4365,  -0.9207,  -4.2024]]

这些是特征值:[1005.2761、853.5491、65.058365、49.994457、10.277865]。我不太确定最后两行。我想绘制在 2D 空间中投影的数据，这似乎弥补了数据中的大部分变化(基本上是 5D 数据的 2D 绘图，因为它看起来像是存在于 2D 流形上)。我做对了吗？谢谢!

最佳答案

Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables (entities each of which takes on various numerical values) into a set of values of linearly uncorrelated variables called principal components

Such dimensionality reduction can be a very useful step for visualising and processing high-dimensional datasets, while still retaining as much of the variance in the dataset as possible. For example, selecting L = 2 and keeping only the first two principal components finds the two-dimensional plane through the high-dimensional dataset in which the data is most spread out, so if the data contains clusters these too may be most spread out, and therefore most visible to be plotted out in a two-dimensional diagram; whereas if two directions through the data (or two of the original variables) are chosen at random, the clusters may be much less spread apart from each other, and may in fact be much more likely to substantially overlay each other, making them indistinguishable.

https://en.wikipedia.org/wiki/Principal_component_analysis

所以你需要运行:

from sklearn.decomposition import PCA
pca = PCA(n_components=2)
pca_result = pca.fit_transform(data)

x = pca_result[:,0]
y = pca_result[:,1]

然后你就有了一个二维空间。

关于python - 不确定 sklearn 中的 PCA，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59242252/

python - 不确定 sklearn 中的 PCA

上一篇：python - 从哪里开始 : Python script that lets a user rank elements against each other by comparison

下一篇：python - 有没有办法删除一列来执行 TSNE？