apache-spark - 如何确定逻辑回归spark中的标签和特征?

标签 apache-spark machine-learning

我正在使用spark mlib,并使用Logistic回归模型进行分类。我点击了这个链接: https://spark.apache.org/docs/2.1.0/ml-classification-regression.html#logistic-regression

 import org.apache.spark.ml.classification.LogisticRegression;
import org.apache.spark.ml.classification.LogisticRegressionModel;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;

// Load training data
Dataset<Row> training = spark.read().format("libsvm")

LogisticRegression lr = new LogisticRegression()

// Fit the model
LogisticRegressionModel lrModel = lr.fit(training);

// Print the coefficients and intercept for logistic regression
System.out.println("Coefficients: "
  + lrModel.coefficients() + " Intercept: " + lrModel.intercept());

// We can also use the multinomial family for binary classification
LogisticRegression mlr = new LogisticRegression()

// Fit the model
LogisticRegressionModel mlrModel = mlr.fit(training);

如果我将 .csv 作为输入,我不确定该模型如何识别标签和特征?谁能解释一下吗?


因为你从at数据加载libsvm,它由标签index1:value1 index2:value2......组成 如果使用.csv,显然必须指定参数。

