python - 不同的Python最小化函数给出不同的值，为什么？

我正在尝试通过重写 Octave 的 Andrew Ng 的机器学习类(class)作业来学习 Python(我参加了类(class)并获得了证书)。我在优化功能方面遇到问题。在类(class)中，他们使用 fmincg，这是 Octave 中使用的一个函数，用于最小化提供其导数的线性回归的成本函数(凸函数)。他们还教你如何使用梯度下降和正规方程，理论上，如果使用正确，它们都会给你相同的结果(小数点后几位以内)。它们都非常适合线性回归，并且我在 python 中得到了相同的结果。需要明确的是，我正在尝试最小化成本函数，以找到数据集的最佳拟合参数 (theta)。到目前为止，我已经使用了“nelder-mead”，它不需要导数，它给了我最接近他们所拥有的解决方案。我还尝试过“TNC”、“CG”和“BFGS”，它们都需要导数来最小化函数。当我有一阶多项式(线性)时，它们都工作得很好，但是当我将多项式的阶数增加到非线性时，在我的情况下，我有 x^1 到 x^8，然后我无法得到我的函数来拟合数据集。我正在做的练习非常简单，我有 12 个数据点，因此输入 8 阶多项式应该捕获每个点(如果您好奇，这是一个高方差的示例，即过度拟合数据)。他们展示的解决方案是一条按预期穿过所有数据点并捕获所有内容的线。我得到的最好结果是当我使用“nelder-mead”方法时，它只捕获了数据集中的两个点，而其余的最小化函数甚至没有给我任何接近我正在寻找的东西。我不确定出了什么问题，因为我的成本函数和梯度为线性情况提供了正确的值，所以我假设它们工作正常(Octave 的确切答案)。

我将列出 Octave 和 python 中的函数，希望有人能向我解释为什么我得到不同的答案。或者指出我没有看到的明显错误。

function [J, grad] = linearRegCostFunction(X, y, theta, lambda)
%LINEARREGCOSTFUNCTION Compute cost and gradient for regularized linear 
%regression with multiple variables
%   [J, grad] = LINEARREGCOSTFUNCTION(X, y, theta, lambda) computes the 
%   cost of using theta as the parameter for linear regression to fit the 
%   data points in X and y. Returns the cost in J and the gradient in grad


m = length(y); % number of training examples 
J = 0;
grad = zeros(size(theta));

htheta = X * theta;
n = size(theta);
J = 1 / (2 * m) * sum((htheta - y) .^ 2) + lambda / (2 * m) * sum(theta(2:n) .^ 2);

grad = 1 / m * X' * (htheta - y);
grad(2:n) = grad(2:n) + lambda / m * theta(2:n); # we leave the bias nice 
grad = grad(:);

end

这是我的代码片段，如果有人喜欢完整的代码，我也可以提供:

def costFunction(theta, Xcost, y, lmda):
    m = len(y)
    theta = theta.reshape((len(theta),1))
    htheta = np.dot(Xcost,theta) - y 
    J = 1 / (2 * m) * np.dot(htheta.T,htheta) + lmda / (2 * m) * np.sum(theta[1:,:]**2)
    return J

def gradCostFunc(gradtheta, X, y, lmda):
    m = len(y)
    gradtheta = gradtheta.reshape((len(gradtheta),1))
    hgradtheta = np.dot(X,gradtheta) - y 
    #gradtheta[0,0] = 0. 

    grad = (1 / m) * np.dot(X.T, hgradtheta)

    #for i in range(1,len(grad)):
    grad[1:,0] = grad[1:,0] + (lmda/m) * gradtheta[1:,0]
    return grad.reshape((len(grad)))

def normalEqn(X, y, lmda):
    e = np.eye(X.shape[1])
    e[0,0] = 0
    theta = np.dot(np.linalg.pinv(np.dot(X.T,X) + lmda * e),np.dot(X.T,y))
    return theta 

def gradientDescent(X, y, theta, alpha, lmda, num_iters):
    # calculate gradient descent in an iterative manner
    m = len(y)
    # J_history tracks the evolution of the cost function 
    J_history = np.zeros((num_iters,1))

    # Calculating the gradients 
    for i in range(0, num_iters):
        grad = np.zeros((len(theta),1))
        grad = gradCostFunc(theta, X, y, lmda)
        #updating the thetas 
        theta = theta - alpha * grad 
        J_history[i] = costFunction(theta, X, y, lmda)

    plt.plot(J_history)
    plt.show()

    return theta 

def trainLR(initheta, X, y, lmda):
    #print theta.shape, X.shape, y.shape, gradtest.shape gradCostFunc
    options = {'maxiter': 1000}
    res = optimize.minimize(costFunction, initheta, jac=gradCostFunc, method='CG',                            args=(X, y, lmda), options = options)
    #res = optimize.minimize(costFunction, theta, method='nelder-mead',                             args=(X,y,lmda), options={'disp': False})
    #res = optimize.fmin_bfgs(costFunction, theta, fprime=gradCostFunc, args=(X, y, lmda))
    return res.x

def polyFeatures(X, degree):
    # map the higher polynomials 
    out = X 
    if degree >= 2:
        for i in range(2,degree+1):
            out = np.column_stack((out,X**i))
        return out 
    else:
        return out

def featureNormalize(X):
    # Since the values will vary by orders of magnitudes 
    # It’s important to normalize the various features 
    mu = np.mean(X, axis=0)
    S1 = np.std(X, axis=0)
    return mu, S1, (X - mu)/S1

这是这些函数的主要调用:

X, y, Xval, yval, Xtest, ytest = loadData('ex5data1.mat')
X_poly = X # to be used in the later on in the program 
p = 8 
X_poly = polyFeatures(X_poly, p)
mu, sigma, X_poly = featureNormalize(X_poly)
X_poly = padding(X_poly)
theta = np.zeros((X_poly.shape[1],1))
theta = trainLR(theta, X_poly, y, 0.)
#theta = normalEqn(X_poly, y, 0.)
#theta = gradientDescent(X_poly, y, theta, 0.1, 0, 1500)

最佳答案

我的回答可能没有重点，因为您的问题是为了帮助调试当前的实现。

也就是说，如果您有兴趣在 Python 中使用现成的优化器，请查看 OpenOpt 。该库包含针对各种优化问题的优化器的合理性能实现。

我还应该提到 scikit-learn库为 Python 提供了一个很好的机器学习工具集。

关于python - 不同的Python最小化函数给出不同的值，为什么？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20711662/

python - 不同的Python最小化函数给出不同的值，为什么？

上一篇：machine-learning - 在 SVM 中使用内核是否会增加过度拟合的可能性？

下一篇：python - 如何正确使用 scipy.optimize.minimize 来返回整数的函数