Numpy 向量化 python for 循环

这是使用 Keras 库创建模型的代码片段:

    for state, action, reward, next_state, done in minibatch:
        target = reward
        if not done:
            target = (reward + self.gamma *
                      np.amax(self.model.predict(next_state)[0]))
        target_f = self.model.predict(state)
        #print (target_f)
        target_f[0][action] = target
        self.model.fit(state, target_f, epochs=1, verbose=0)

我正在尝试对其进行矢量化。我认为唯一的方法是: 1. 创建一个 numpy 表，每行 = (state, action,reward, next_state, done, target)。因此，将会有“小批量”行数。 2. 根据其他列更新目标列(使用屏蔽数组):

target[done==True] ==reward
target[done==False] == reward + self.gamma 
*np.amax(self.model.predict(next_state)[0])

现在更新 self.model.fit(state, target_f, epochs=1, verbose=0)

注意:状态是 8 维的，因此状态向量有 8 个元素。

尽管经过数小时的努力，我仍无法正确编码。是否可以实际矢量化这段代码？

最佳答案

你们很接近!假设 minibatch 是一个 np.array:

首先找到done为true的所有索引。假设 done 的索引号为 4。

minibatch_done=minibatch[np.where(minibatch[:,4]==True)]
minibatch_not_done=minibatch[np.where(minibatch[:,4]==False)]

现在我们用它来有条件地更新小批量矩阵。假设索引 2 是 reward，索引 3 是 next_state

target = np.empty((minibatch.shape[0]))
n_done = minibatch_done.shape[0]
# First half (index 0...n_done)
target[:n_done] = minibatch_done[:,2]+self.gamma*np.amax(self.model.predict(minibatch_done[:,3]))
target[n_done:] = minibatch_not_done[:,2]

这就是你想要的:)

编辑:修复了目标问题中的索引错误

关于Numpy 向量化 python for 循环，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50975430/

Numpy 向量化 python for 循环

上一篇：angularjs - 如何将参数从 AngularJS 指令传递到 AngularJS Controller 函数

下一篇：javascript - tus - 上传后访问控制允许来源错误