python - pyspark - LinearRegression.load() 抛出 NoSuchMethodException

标签 python apache-spark machine-learning pyspark

在尝试加载线性回归模型时出现以下错误:

Traceback (most recent call last):
  File "server.py", line 5, in <module>
    linReg = Model()
  File "/home/pyspark/Desktop/building_py_rec/lin_reg/ml_algo/model.py", line 23, in __init__
    self.model = LinearRegression.load('model_lin_reg')
  File "/home/pyspark/spark-2.1.0-bin-hadoop2.7/python/pyspark/ml/util.py", line 252, in load
    return cls.read().load(path)
  File "/home/pyspark/spark-2.1.0-bin-hadoop2.7/python/pyspark/ml/util.py", line 193, in load
    java_obj = self._jread.load(path)
  File "/home/pyspark/spark-2.1.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
  File "/home/pyspark/spark-2.1.0-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/home/pyspark/spark-2.1.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o24.load.
: java.lang.NoSuchMethodException: org.apache.spark.ml.regression.LinearRegressionModel.<init>(java.lang.String)
    at java.lang.Class.getConstructor0(Class.java:3082)
    at java.lang.Class.getConstructor(Class.java:1825)
    at org.apache.spark.ml.util.DefaultParamsReader.load(ReadWrite.scala:325)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:280)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:748)

相关代码如下:

import findspark;
findspark.init()

import pyspark;
from pyspark.sql import SparkSession;
from pyspark.ml.regression import LinearRegression;
from pyspark.ml.feature import VectorAssembler;
from pyspark.sql.types import *;

X_Cols = ["Freq_Hz", "AoA_Deg", "Chord_m", "V_inf_mps", "displ_thick_m"]


class Model:

    spark = None;
    model = None;
    airfoil_assembler = None;

    def __init__(self):
        print('[Model] Creating spark session...');
        self.spark = SparkSession.builder.appName('lin_reg_reader').getOrCreate();
        print('[Model] Loading model...');      
        self.model = LinearRegression.load('model_lin_reg')
        print('[Model] Loading complete...');
        self.airfoil_assembler = VectorAssembler(inputCols=X_Cols, outputCol='features')
        return ;

    def _getSchema(self):
        schema = StructType({
            StructField("Freq_Hz", IntegerType(), False),
            StructField("AoA_Deg", IntegerType(), False),
            StructField("Chord_m", DoubleType(), False),
            StructField("V_inf_mps", DoubleType(), False),
            StructField("displ_thick_m", DoubleType(), False),
        });
        return schema

    def _prepare_df(self):
        schema = self._getSchema();
        return df;

    def assemble(self, tup):
        schema = self._getSchema();
        df = self.spark.createDataFrame(tup, schema)
        assembled_vector = self.airfoil_assembler.transform(df);
        return assembled_vector;

    def predict(self, airfoil):
        assembled_vector = self.assemble(tup=airfoil)
        return self.model.predict(assembled_vector)

注意

  • 星火版本:2.1.0
  • 引用这个link表明它们应该是一个可用的load函数。

最佳答案

您使用了错误的类。要加载模型,请使用 LinearRegressionModel:

from pyspark.ml.regression import LinearRegressionModel

LinearRegressionModel.load('model_lin_reg')

关于python - pyspark - LinearRegression.load() 抛出 NoSuchMethodException,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48693645/

相关文章:

python - 将 Gridsearch 中的最佳参数保存在 pandas 数据框中

machine-learning - 神经网络权重和成本发生变化,但准确性不变

python - 使用 python 发送 HTML 丰富的电子邮件

python - 换行符出现在屏幕上但不出现在电子邮件中

scala - 为什么我无法在读取 xml 和过滤器时根据此条件进行过滤

scala - AggregateByKey 在抽象类中时无法编译

tensorflow - 可变长度序列上的 RNN 注意力权重是否应该重新归一化为 "mask"零填充的影响?

python - statsmodels 逻辑回归类型问题

python - 使用 PIL 逐帧分析视频

java - 带有 ConnectionInitException 的 java WARN 消息的 Spark Cassandra Connector