machine-learning - 用数字表示梯度下降中的线性回归特征

标签 machine-learning linear-regression gradient-descent

下面的 python 代码非常适合查找梯度下降:

def gradientDescent(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y 
        cost = np.sum(loss ** 2) / (2 * m)
        print("Iteration %d | Cost: %f" % (i, cost))
        gradient = np.dot(xTrans, loss) / m 
        theta = theta - alpha * gradient
    return theta

这里，x = m*n(m = 样本数据数，n = 特征总数)特征矩阵。

但是，如果我的特征是“2”部电影的非数字(例如，导演和类型)，那么我的特征矩阵可能如下所示:

['Peter Jackson', 'Action'
 Sergio Leone', 'Comedy']

在这种情况下，如何将这些特征映射到数值并应用梯度下降？

最佳答案

您可以将特征映射到您选择的数值，然后以通常的方式应用梯度下降。

在 python 中，你可以使用 panda 轻松做到这一点:

import pandas as pd
df = pd.DataFrame(X, ['director', 'genre'])
df.director = df.director.map({'Peter Jackson': 0, 'Sergio Leone': 1})
df.genre = df.genre.map({'Action': 0, 'Comedy': 1})

正如您所看到的，这种方式可能会变得相当复杂，最好编写一段动态执行此操作的代码。

关于machine-learning - 用数字表示梯度下降中的线性回归特征，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/33631203/

上一篇：r - 尝试使用 "bnlearn"实现一个简单的朴素贝叶斯分类器。不断出现错误 "variables must be either numeric, factors or ordered factors"

下一篇：python - 使用 fit 进行 sklearn gridsearchcv

相关文章：

python - MinMax 缩放目标

python-3.x - 为什么 sklearn 中的岭回归和套索回归需要 random_state？

python - 在 PyTorch 中为批处理中的每个样本计算梯度

python - 在 PyTorch 中使用带有自调整优化器的调度程序

python - 无法通过使用相同参数运行单个模型来重现 GridSearchCV/RandomizedSearchCV 的结果

python - 如何更好地预处理图像以获得更好的深度学习结果？

statistics - 计算点击率时如何避免展示偏差？

线性回归中的 R 分类变量

tensorflow - 了解小批量的渐变带

r - 实现AdaBoost算法的问题