我有以下带有 double 组的数据框,需要将其转换为向量才能将其传递给 ML 算法。谁能帮我解决这个问题?
fList: org.apache.spark.sql.DataFrame = [features: array<double>]
+--------------------------------------------------------------------------------+
|features |
+--------------------------------------------------------------------------------+
|[2.5046410000000003, 2.1487149999999997, 1.0884870000000002, 3.5877090000000003]|
|[0.9558040000000001, 0.9843780000000002, 0.545025, 0.9979860000000002] |
+--------------------------------------------------------------------------------+
预期输出: 应该看起来像这样。
fList: org.apache.spark.sql.DataFrame = [features: vector]
最佳答案
我建议你写一个udf
函数
import org.apache.spark.sql.functions._
import org.apache.spark.mllib.linalg.Vectors
def convertArrayToVector = udf((features: mutable.WrappedArray[Double]) => Vectors.dense(features.toArray))
并在 withColumn
api 中调用该函数
scala> df.withColumn("features", convertArrayToVector($"features"))
res1: org.apache.spark.sql.DataFrame = [features: vector]
希望回答对你有帮助
关于scala - 如何将 double 组的数据帧转换为向量?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47543747/