amazon-web-services - AWS SageMaker RL 与 ray : ray. tune.error.TuneError:未指定可训练项

标签 amazon-web-services reinforcement-learning amazon-sagemaker ray rllib

我有一个基于 AWS SageMaker RL 示例 rl_network_compression_ray_custom 的训练脚本,但更改了环境以制作基本的健身房环境 Asteroids-v0(在训练脚本的主入口点安装依赖项)。当我在 RLEstimator 上运行拟合时,即使在训练配置中将运行指定为 DQN,也会出现以下错误 ray.tune.error.TuneError:未指定可训练!

有人知道这个问题以及如何解决吗?

这是更长的日志:

Running experiment with config {
  "training": {
    "env": "Asteroids-v0",
    "run": "DQN",
    "stop": {
      "training_iteration": 1
    },
    "local_dir": "/opt/ml/output/intermediate",
    "checkpoint_freq": 10,
    "config": {
      "double_q": false,
      "dueling": false,
      "num_atoms": 1,
      "noisy": false,
      "prioritized_replay": false,
      "n_step": 1,
      "target_network_update_freq": 8000,
      "lr": 6.25e-05,
      "adam_epsilon": 0.00015,
      "hiddens": [
        512
      ],
      "learning_starts": 20000,
      "buffer_size": 1000000,
      "sample_batch_size": 4,
      "train_batch_size": 32,
      "schedule_max_timesteps": 2000000,
      "exploration_final_eps": 0.01,
      "exploration_fraction": 0.1,
      "prioritized_replay_alpha": 0.5,
      "beta_annealing_fraction": 1.0,
      "final_prioritized_replay_beta": 1.0,
      "num_gpus": 0.2,
      "timesteps_per_iteration": 10000
    },
    "checkpoint_at_end": true
  },
  "trial_resources": {
    "cpu": 1,
    "extra_cpu": 3
  }
}
Important! Ray with version <=7.2 may report "Did not find checkpoint file" even if the experiment is actually restored successfully. If restoration is expected, please check "training_iteration" in the experiment info to confirm.
Traceback (most recent call last):
  File "train-ray.py", line 83, in <module>
    MyLauncher().train_main()
  File "/opt/ml/code/sagemaker_rl/ray_launcher.py", line 332, in train_main
    launcher.launch()
  File "/opt/ml/code/sagemaker_rl/ray_launcher.py", line 313, in launch
    run_experiments(experiment_config)
  File "/usr/local/lib/python3.6/dist-packages/ray/tune/tune.py", line 296, in run_experiments
    experiments = convert_to_experiment_list(experiments)
  File "/usr/local/lib/python3.6/dist-packages/ray/tune/experiment.py", line 199, in convert_to_experiment_list
    for name, spec in experiments.items()
  File "/usr/local/lib/python3.6/dist-packages/ray/tune/experiment.py", line 199, in <listcomp>
    for name, spec in experiments.items()
  File "/usr/local/lib/python3.6/dist-packages/ray/tune/experiment.py", line 122, in from_json
    raise TuneError("No trainable specified!")
ray.tune.error.TuneError: No trainable specified!
2020-04-22 13:21:15,784 sagemaker-containers ERROR    ExecuteUserScriptError:
Command "/usr/bin/python train-ray.py --rl.training.checkpoint_freq 1 --rl.training.stop.training_iteration 1 --s3_bucket XXXXX

最佳答案

日志表明实验配置未正确传入。你能尝试一下 roboschool相反,环境更简单,并提供错误日志(如果出现)。请确保所有依赖项都包含在 Dockerfile 中以构建自定义镜像。

关于amazon-web-services - AWS SageMaker RL 与 ray : ray. tune.error.TuneError:未指定可训练项,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61366474/

相关文章:

amazon-web-services - AWS 步骤函数 : How can i invoke multiple instances of same lambda in step function

amazon-web-services - 尝试授予 IAM 用户创建和分配角色的权限,但限制可用策略的类型

tensorflow - DeepMind 的 Sonnet 能提供 Keras 不能提供的什么?

machine-learning - 通过强化学习进行多标准优化

tensorflow - 获取 key 错误 : 'callable_inputs' when trying to save a TF model in S3 bucket

amazon-sagemaker - 更新 SageMaker Jupyterlab 环境

amazon-web-services - 有没有办法更改 Amazon API Gateway 返回的 http 状态代码?

amazon-web-services - Amazon VOD 拆箱目录架构

python - TensorFlow 模型拟合和 train_on_batch 之间的区别

aws-java-sdk - 在 sagemaker 中进行预测之前,如何预处理输入数据?