amazon-web-services - AWS SageMaker RL 与 ray : ray. tune.error.TuneError:未指定可训练项

我有一个基于 AWS SageMaker RL 示例 rl_network_compression_ray_custom 的训练脚本，但更改了环境以制作基本的健身房环境 Asteroids-v0(在训练脚本的主入口点安装依赖项)。当我在 RLEstimator 上运行拟合时，即使在训练配置中将运行指定为 DQN，也会出现以下错误 ray.tune.error.TuneError:未指定可训练!。

有人知道这个问题以及如何解决吗？

这是更长的日志:

Running experiment with config {
  "training": {
    "env": "Asteroids-v0",
    "run": "DQN",
    "stop": {
      "training_iteration": 1
    },
    "local_dir": "/opt/ml/output/intermediate",
    "checkpoint_freq": 10,
    "config": {
      "double_q": false,
      "dueling": false,
      "num_atoms": 1,
      "noisy": false,
      "prioritized_replay": false,
      "n_step": 1,
      "target_network_update_freq": 8000,
      "lr": 6.25e-05,
      "adam_epsilon": 0.00015,
      "hiddens": [
        512
      ],
      "learning_starts": 20000,
      "buffer_size": 1000000,
      "sample_batch_size": 4,
      "train_batch_size": 32,
      "schedule_max_timesteps": 2000000,
      "exploration_final_eps": 0.01,
      "exploration_fraction": 0.1,
      "prioritized_replay_alpha": 0.5,
      "beta_annealing_fraction": 1.0,
      "final_prioritized_replay_beta": 1.0,
      "num_gpus": 0.2,
      "timesteps_per_iteration": 10000
    },
    "checkpoint_at_end": true
  },
  "trial_resources": {
    "cpu": 1,
    "extra_cpu": 3
  }
}
Important! Ray with version <=7.2 may report "Did not find checkpoint file" even if the experiment is actually restored successfully. If restoration is expected, please check "training_iteration" in the experiment info to confirm.
Traceback (most recent call last):
  File "train-ray.py", line 83, in <module>
    MyLauncher().train_main()
  File "/opt/ml/code/sagemaker_rl/ray_launcher.py", line 332, in train_main
    launcher.launch()
  File "/opt/ml/code/sagemaker_rl/ray_launcher.py", line 313, in launch
    run_experiments(experiment_config)
  File "/usr/local/lib/python3.6/dist-packages/ray/tune/tune.py", line 296, in run_experiments
    experiments = convert_to_experiment_list(experiments)
  File "/usr/local/lib/python3.6/dist-packages/ray/tune/experiment.py", line 199, in convert_to_experiment_list
    for name, spec in experiments.items()
  File "/usr/local/lib/python3.6/dist-packages/ray/tune/experiment.py", line 199, in <listcomp>
    for name, spec in experiments.items()
  File "/usr/local/lib/python3.6/dist-packages/ray/tune/experiment.py", line 122, in from_json
    raise TuneError("No trainable specified!")
ray.tune.error.TuneError: No trainable specified!
2020-04-22 13:21:15,784 sagemaker-containers ERROR    ExecuteUserScriptError:
Command "/usr/bin/python train-ray.py --rl.training.checkpoint_freq 1 --rl.training.stop.training_iteration 1 --s3_bucket XXXXX

最佳答案

日志表明实验配置未正确传入。你能尝试一下 roboschool相反，环境更简单，并提供错误日志(如果出现)。请确保所有依赖项都包含在 Dockerfile 中以构建自定义镜像。

关于amazon-web-services - AWS SageMaker RL 与 ray : ray. tune.error.TuneError:未指定可训练项，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/61366474/

amazon-web-services - AWS SageMaker RL 与 ray : ray. tune.error.TuneError:未指定可训练项

上一篇：r - 使用现有绘图创建 Shiny 的下拉菜单

下一篇：python - 如何使用多个模板实现 Django session 向导