python - 值在 for 循环内自动更新

标签 python python-3.x python-2.7 loops while-loop

我试图实现值迭代算法。 我有一个网格

grid = [[0, 0, 0, +1],
    [0, "W", 0, -1],
    [0, 0, 0, 0]]

Action 列表

actlist = {UP:1, DOWN:2, LEFT:3, RIGHT:4}

还有奖励函数

reward = [[0, 0, 0, 0],
      [0, 0, 0, 0],
      [0, 0, 0,  0]] 

我编写了一个函数 T,它返回 3 个元组的元组。

def T(i,j,actions):
if(i == 0 and j == 0):
    if(actions == UP):
        return (i,i,0.8),(i,i,0.1),(i,j+1,0.1)
    elif(actions == DOWN):
        return (i+1,j,0.8),(i,j,0.1),(i,j+1,0.1)
    elif(actions == LEFT):
        return (i,j,0.8),(i,j,0.1),(i+1,j,0.1)
    elif(actions == RIGHT):
        return (i,j+1,0.8),(i,i,0.1),(i+1,j,0.1)
elif (i == 0 and j == 1):
    if(actions == UP):
        return (i,i,0.8),(i,j-1,0.1),(i,j+1,0.1)
    elif(actions == DOWN):
        return (i,j,0.8),(i,j-1,0.1),(i,j+1,0.1)
    elif(actions == LEFT):
        return (i,j-1,0.8),(i,j,0.1),(i,j,0.1)
    elif(actions == RIGHT):
        return (i,j+1,0.8),(i,j,0.1),(i,j,0.1)
elif(i == 0 and j == 2):
    if(actions == UP):
        return (i,j,0.8),(i,j-1,0.1),(i,j+1,0.1)
    elif(actions == DOWN):
        return(i+1,j,0.8),(i,j-1,0.1),(i,j+1,0.1)
    elif(actions == LEFT):
        return (i,j-1,0.8),(i,j,0.1),(i+1,j,0.1)
    elif(actions == RIGHT):
        return (i,j+1,0.8),(i,j,0.1),(i+1,j,0.1)
elif(i == 0 and j == 3):
    if(actions == UP):
        return (-1,-1,0.8),(-1,-1,0.1),(-1,-1,0.1)
    elif(actions == DOWN):
        return (-1,-1,0.8),(-1,-1,0.1),(-1,-1,0.1)
    elif(actions == LEFT):
        return (-1,-1,0.8),(-1,-1,0.1),(-1,-1,0.1)
    elif(actions == RIGHT):
        return (-1,-1,0.8),(-1,-1,0.1),(-1,-1,0.1)
# 2nd row
elif (i == 1 and j == 0):
    if(actions == UP):
        return (i-1,j,0.8),(i,j,0.1),(i,j,0.1)
    elif(actions == DOWN):
        return (i+1,j,0.8),(i,j,0.1),(i,j,0.1)
    elif(actions == LEFT):
        return (i,j,0.8),(i-1,j,0.1),(i+1,j,0.1)
    elif(actions == RIGHT):
        return (i,j,0.8),(i-1,j,0.1),(i+1,j,0.1)
elif(i == 1 and j ==1):
    if(actions == UP):
        return (i,j,0.8),(i,j,0.1),(i,j,0.1)
    elif(actions == DOWN):
        return (i,j,0.8),(i,j,0.1),(i,j,0.1)
    elif(actions == LEFT):
        return (i,j,0.8),(i,j,0.1),(i,j,0.1)
    elif(actions == RIGHT):
        return (i,j,0.8),(i,j,0.1),(i,j,0.1)
elif (i == 1 and j == 2):
    if(actions == UP):
        return (i-1,j,0.8),(i,j,0.1),(i,j+1,0.1)
    elif(actions == DOWN):
        return (i+1,j,0.8),(i,j,0.1),(i,j+1,0.1)
    elif(actions == LEFT):
        return (i,j,0.8),(i-1,j,0.1),(i+1,j,0.1)
    elif(actions == RIGHT):
        return (i,j+1,0.8),(i-1,j,0.1),(i+1,j,0.1)
elif(i == 1 and j == 3):
    if(actions == UP):
        return (-2,-2,0.8),(-2,-2,0.1),(-2,-2,0.1)
    elif(actions == DOWN):
        return (-2,-2,0.8),(-2,-2,0.1),(-2,-2,0.1)
    elif(actions == LEFT):
        return (-2,-2,0.8),(-2,-2,0.1),(-2,-2,0.1)
    elif(actions == RIGHT):
        return (-2,-2,0.8),(-2,-2,0.1),(-2,-2,0.1)      
# 3rd row
elif(i == 2 and j == 0):
    if(actions == UP):
        return (i-1,j,0.8),(i,j,0.1),(i,j+1,0.1)
    elif(actions == DOWN):
        return (i,j,0.8),(i,j,0.1),(i,j+1,1,0.1)
    elif(actions == LEFT):
        return (i,j,0.8),(i-1,j,0.1),(i,j,0.1)
    elif(actions == RIGHT):
        return (i,j+1,0.8),(i-1,j,0.1),(i,j,0.1)
elif (i == 2 and j == 1):
    if(actions == UP):
        return (i,j,0.8),(i,j-1,0.1),(i,j+1,0.1)
    elif(actions == DOWN):
        return (i,j,0.8),(i,j-1,0.1),(i,j+1,0.1)
    elif(actions == LEFT):
        return (i,j-1,0.8),(i,j,0.1),(i,j,0.1)
    elif(actions == RIGHT):
        return (i,j+1,0.8),(i,j,0.1),(i,j,0.1)
elif(i == 2 and j == 2):
    if(actions == UP):
        return (i-1,j,0.8),(i,j-1,0.1),(i,j+1,0.1)
    elif(actions == DOWN):
        return (i,j,0.8),(i,j-1,0.1),(i,j+1,0.1)
    elif(actions == LEFT):
        return (i,j-1,0.8),(i-1,j,0.1),(i,j,1)
    elif(actions == RIGHT):
        return (i,j+1,0.8),(i-1,j,0.1),(i,j,0.1)
elif(i == 2 and j == 3):
    if(actions == UP):
        return (i-1,j,0.8),(i,j-1,0.1),(i,j,0.1)
    elif(actions == DOWN):
        return (i,j,0.8),(i,j-1,0.1),(i,j,0.1)
    elif(actions == LEFT):
        return (i,j-1,0.8),(i-1,j,0.1),(i,j,0.1)
    elif(actions == RIGHT):
        return (i,j,0.8),(i-1,j,0.1),(i,j,0.1) 

该函数在值迭代函数中调用:

def value_iteration():
U1 = [[0, 0, 0, 0],
      [0, 0, 0, 0],
      [0, 0, 0, 0]] 
while True:
    U=U1.copy()
    delta = 0
    for i in range(len(grid)):
        for j in range(len(grid[i])):
            U1[i][j] = max(sum(p*(R(k,l)+gamma*U[k][l]) for (k,l,p) in T(i,j,a)) for a in actlist)
            print(i,j,U1[i][j],U[i][j])
            delta = max(delta, abs(U1[i][j] - U[i][j]))
    if delta <= epsilon*(1 - gamma)/gamma:
        return U

我正在更新

U=U1.copy()

在 while 循环中。

问题是,输出如下所示:

0 0 0.0 0.0
0 1 0.0 0.0
0 2 0.0 0.0
0 3 1.0 1.0
1 0 0.0 0.0
1 2 0.0 0.0
1 3 -1.0 -1.0
2 0 0.0 0.0
2 1 0.0 0.0
2 2 0.7000000000000001 0.7000000000000001
2 3 0.9630000000000001 0.9630000000000001

但我没有在 for 循环内更新 U。 U 应该保持不变(这意味着全为零),而 U1 应该只发生变化。为什么在 for 循环中 U 会自动设置为 U1 的值?

最佳答案

U1(和 U)是列表的列表,实际上是对列表的引用的列表。

您正在(浅)复制外部列表,但副本的内容仍然是对相同内部列表的引用。

尝试:

import copy
U = copy.deepcopy(U1)

看看会发生什么。 deepcopy 将正确地递归复制列表。

关于python - 值在 for 循环内自动更新,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51710667/

相关文章:

python - 如何检查输入的字母是否在列表中?

python - 如何通过可变数量增加对象属性

python - python 的 heapq.merge 使用的算法是什么?

python - 类型错误 : not all arguments converted during string formatting 11

python - 使用 add_edge_list() 方法创建图形的最佳方法是什么?

Python脚本无法找到下载文件夹并打开文件

python - Karger 在 python 2.7 中的最小切割实现没有给出正确的切割

Python:如何创建 for 循环来更改 dict 值并将其附加到列表中?

python - img 不是 numpy 数组,也不是标量

if-statement - IF 语句真值条件