我正在使用spark mlib,并使用Logistic回归模型进行分类。我点击了这个链接: https://spark.apache.org/docs/2.1.0/ml-classification-regression.html#logistic-regression
import org.apache.spark.ml.classification.LogisticRegression;
import org.apache.spark.ml.classification.LogisticRegressionModel;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
// Load training data
Dataset<Row> training = spark.read().format("libsvm")
.load("data/mllib/sample_libsvm_data.txt");
LogisticRegression lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8);
// Fit the model
LogisticRegressionModel lrModel = lr.fit(training);
// Print the coefficients and intercept for logistic regression
System.out.println("Coefficients: "
+ lrModel.coefficients() + " Intercept: " + lrModel.intercept());
// We can also use the multinomial family for binary classification
LogisticRegression mlr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
.setFamily("multinomial");
// Fit the model
LogisticRegressionModel mlrModel = mlr.fit(training);
如果我将 .csv 作为输入,我不确定该模型如何识别标签和特征?谁能解释一下吗?
最佳答案
因为你从at数据加载libsvm,它由标签index1:value1 index2:value2......组成 如果使用.csv,显然必须指定参数。
关于apache-spark - 如何确定逻辑回归spark中的标签和特征?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43652725/