python - OpenAI-Gym 和 Keras-RL : DQN expects a model that has one dimension for each action

标签 python keras reinforcement-learning openai-gym

我正在尝试在 OpenAI Gym 中设置具有自定义环境的 Deep-Q-Learning 代理。我有 4 个具有单独限制的连续状态变量和 3 个具有单独限制的整数 Action 变量。

这是代码:

#%% import 
from gym import Env
from gym.spaces import Discrete, Box, Tuple
import numpy as np


#%%
class Custom_Env(Env):

    def __init__(self):
        
       # Define the state space
       
       #State variables
       self.state_1 = 0
       self.state_2 =  0
       self.state_3 = 0
       self.state_4_currentTimeSlots = 0
       
       #Define the gym components
       self.action_space = Box(low=np.array([0, 0, 0]), high=np.array([10, 20, 27]), dtype=np.int)    
                                                                             
       self.observation_space = Box(low=np.array([20, -20, 0, 0]), high=np.array([22, 250, 100, 287]),dtype=np.float16)

    def step(self, action ):

        # Update state variables
        self.state_1 = self.state_1 + action [0]
        self.state_2 = self.state_2 + action [1]
        self.state_3 = self.state_3 + action [2]

        #Calculate reward
        reward = self.state_1 + self.state_2 + self.state_3
       
        #Set placeholder for info
        info = {}    
        
        #Check if it's the end of the day
        if self.state_4_currentTimeSlots >= 287:
            done = True
        if self.state_4_currentTimeSlots < 287:
            done = False       
        
        #Move to the next timeslot 
        self.state_4_currentTimeSlots +=1

        state = np.array([self.state_1,self.state_2, self.state_3, self.state_4_currentTimeSlots ])

        #Return step information
        return state, reward, done, info
        
    def render (self):
        pass
    
    def reset (self):
       self.state_1 = 0
       self.state_2 =  0
       self.state_3 = 0
       self.state_4_currentTimeSlots = 0
       state = np.array([self.state_1,self.state_2, self.state_3, self.state_4_currentTimeSlots ])
       return state

#%% Set up the environment
env = Custom_Env()

#%% Create a deep learning model with keras


from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import Adam

def build_model(states, actions):
    model = Sequential()
    model.add(Dense(24, activation='relu', input_shape=states))
    model.add(Dense(24, activation='relu'))
    model.add(Dense(actions[0] , activation='linear'))
    return model

states = env.observation_space.shape 
actions = env.action_space.shape 
print("env.observation_space: ", env.observation_space)
print("env.observation_space.shape : ", env.observation_space.shape )
print("action_space: ", env.action_space)
print("action_space.shape : ", env.action_space.shape )


model = build_model(states, actions)
print(model.summary())

#%% Build Agent wit Keras-RL
from rl.agents import DQNAgent
from rl.policy import BoltzmannQPolicy
from rl.memory import SequentialMemory

def build_agent (model, actions):
    policy = BoltzmannQPolicy()
    memory = SequentialMemory(limit = 50000, window_length=1)
    dqn = DQNAgent (model = model, memory = memory, policy=policy,
                    nb_actions=actions, nb_steps_warmup=10, target_model_update= 1e-2)
    return dqn

dqn = build_agent(model, actions)
dqn.compile(Adam(lr=1e-3), metrics = ['mae'])
dqn.fit (env, nb_steps = 4000, visualize=False, verbose = 1)

当我运行此代码时,我收到以下错误消息

ValueError: Model output "Tensor("dense_23/BiasAdd:0", shape=(None, 3), dtype=float32)" has invalid shape. DQN expects a model that has one dimension for each action, in this case (3,).

由行 dqn = DQNAgent(model = model,内存=内存,policy=policy,nb_actions=actions,nb_steps_warmup=10,target_model_update= 1e-2)抛出

谁能告诉我,为什么会出现这个问题以及如何解决这个问题?我认为它与构建的模型有关,因此与 Action 和状态空间有关。但我无法弄清楚问题到底是什么。

赏金提醒:我的赏金即将到期,不幸的是,我仍然没有收到任何答复。如果您至少知道如何解决这个问题,如果您与我分享您的想法,我将非常感激,我将非常感激。

最佳答案

正如我们在评论中谈到的,似乎不再支持 Keras-rl 库(存储库中的最后一次更新是在 2019 年),因此现在所有内容可能都在 Keras 中。我查看了 Keras 文档,没有高级函数来构建强化学习模型,但可以使用较低级函数来实现。

  • 以下是如何将 Deep Q-Learning 与 Keras 结合使用的示例:link

另一个解决方案可能是降级到 Tensorflow 1.0,因为 2.0 版本的一些更改似乎导致出现兼容性问题。我没有测试,但也许 Keras-rl + Tensorflow 1.0 可以工作。

还有一个branch Keras-rl 支持 Tensorflow 2.0,存储库已存档,但它有可能为您工作

关于python - OpenAI-Gym 和 Keras-RL : DQN expects a model that has one dimension for each action,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70261352/

相关文章:

python - Sonarqube 客户端无法解析 pytest 覆盖率结果

python - tensorflow /keras中批量大小的自定义损失w权重数组

machine-learning - 多标签分类 keras 的奇怪准确性

python - REINFORCE深度强化学习算法中的折扣奖励

machine-learning - 对于深度学习,使用激活 relu,输出在训练期间变为 NAN,而使用 tanh 则输出正常

python - 如何使用 tkinter pack() 方法将小部件放在下一行?

python - 如何只取整数项并计算列表中的总和?

python - Keras:多类不平衡数据分类过拟合

python - 了解梯度策略推导

python - 在python中按句子结构对文本进行分类