c++ - 批量梯度下降算法不收敛

我正在尝试为我的机器学习作业实现批量梯度下降算法。我有一个训练集，其 x 值约为 10^3，y 值约为 10^6。我试图找到使 y = theta0 + theta1 * x 收敛的 [theta0, theta1] 的值。我将学习率设置为 0.0001，将最大交互设置为 10。这是我在 Qt 中的代码。

QVector<double> gradient_descent_batch(QVector<double> x, QVector<double>y)
{
    QVector<double> theta(0);
    theta.resize(2);

    int size = x.size();

    theta[1] = 0.1;
    theta[0] = 0.1;

    for (int j=0;j<MAX_ITERATION;j++)
    {
        double dJ0 = 0.0;
        double dJ1 = 0.0;

        for (int i=0;i<size;i++)
        {
            dJ0 += (theta[0] + theta[1] * x[i] - y[i]);
            dJ1 += (theta[0] + theta[1] * x[i] - y[i]) * x[i];
        }

        double theta0 = theta[0];
        double theta1 = theta[1];
        theta[0] = theta0 - LRATE * dJ0;
        theta[1] = theta1 - LRATE * dJ1;

        if (qAbs(theta0 - theta[0]) < THRESHOLD && qAbs(theta1 - theta[1]) < THRESHOLD)
            return theta;
    }

    return theta;
}

我在每次交互时打印 theta 的值，这是结果。

QVector(921495, 2.29367e+09) 
QVector(-8.14503e+12, -1.99708e+16) 
QVector(7.09179e+19, 1.73884e+23) 
QVector(-6.17475e+26, -1.51399e+30) 
QVector(5.3763e+33, 1.31821e+37) 
QVector(-4.68109e+40, -1.14775e+44) 
QVector(4.07577e+47, 9.99338e+50) 
QVector(-3.54873e+54, -8.70114e+57) 
QVector(3.08985e+61, 7.57599e+64) 
QVector(-2.6903e+68, -6.59634e+71)

我好像theta永远不会收敛。我遵循解决方案 here将学习率设置为 0.00000000000001 并将最大迭代设置为 20。但是好像不会收敛。这是结果。

QVector(0.100092, 0.329367) 
QVector(0.100184, 0.558535) 
QVector(0.100276, 0.787503) 
QVector(0.100368, 1.01627) 
QVector(0.10046, 1.24484) 
QVector(0.100552, 1.47321) 
QVector(0.100643, 1.70138) 
QVector(0.100735, 1.92936) 
QVector(0.100826, 2.15713) 
QVector(0.100918, 2.38471) 
QVector(0.101009, 2.61209) 
QVector(0.1011, 2.83927) 
QVector(0.101192, 3.06625) 
QVector(0.101283, 3.29303) 
QVector(0.101374, 3.51962) 
QVector(0.101465, 3.74601) 
QVector(0.101556, 3.9722) 
QVector(0.101646, 4.1982) 
QVector(0.101737, 4.424) 
QVector(0.101828, 4.6496)

怎么了？

最佳答案

所以首先你的算法看起来不错，除了你应该将 LRATE 除以大小；

theta[0] = theta0 - LRATE * dJ0 / size;
theta[1] = theta1 - LRATE * dJ1 / size;

我建议您应该计算成本函数并对其进行监控；

Cost function

您的成本应该在每次迭代中都在降低。如果它来回反弹，则说明您正在使用较大的学习率。我建议您使用 0.01 并进行 400 次迭代。

关于c++ - 批量梯度下降算法不收敛，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35968603/

c++ - 批量梯度下降算法不收敛

上一篇：c++ - 根据 Qt 版本动态使用 QGLWidget 或 QOpenGLWidget

下一篇：c++ - 如何在不同的文件中声明变量？