我使用 StandardScaler 安装了具有缩放功能的 KMeans。问题是簇中心体也被缩放。是否可以通过编程方式获取原始质心?
import pandas as pd
import numpy as np
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.feature import StandardScaler, StandardScalerModel
from pyspark.ml.clustering import KMeans
from sklearn.datasets import load_iris
# iris data set
iris = load_iris()
iris_data = pd.DataFrame(iris['data'], columns=iris['feature_names'])
iris_df = sqlContext.createDataFrame(iris_data)
assembler = VectorAssembler(
inputCols=[x for x in iris_df.columns],outputCol='features')
data = assembler.transform(iris_df)
scaler = StandardScaler(inputCol="features", outputCol="scaledFeatures", withStd=True, withMean=False)
scalerModel = scaler.fit(data)
scaledData = scalerModel.transform(data).drop('features').withColumnRenamed('scaledFeatures', 'features')
kmeans = KMeans().setFeaturesCol("features").setPredictionCol("prediction").setK(3)
model = kmeans.fit(scaledData)
centers = model.clusterCenters()
print("Cluster Centers: ")
for center in centers:
print(center)
在这里我想获得原始比例的中心。 中心线已缩放。
[ 7.04524479 6.17347978 2.50588155 1.88127377]
[ 6.0454109 7.88294475 0.82973422 0.31972295]
[ 8.22013841 7.19671468 3.13005178 2.59685552]
最佳答案
您使用 withStd=True
和 withMean=False
来 StandardScaler
。要移回初始空间,您必须乘以 std
向量:
[cluster * scalerModel.std for cluster in model.clusterCenters()]
如果withMean
为True
,您将使用:
[cluster * scalerModel.std + scalerModel.mean
for cluster in model.clusterCenters()]
关于python - Spark 使用 StandardScaler 获取实际的集群中心体,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47705919/