我正在尝试按照 this tutorial 中给出的说明创建自定义算法.
当我运行训练作业时,它失败并显示错误 No such file or directory: '/opt/ml/input/data/training'。
根据文档,SageMaker 应在运行时创建这些文档并复制数据和工件。但这并没有发生。
请分享您对此的看法。
我的 DockerFile 内容,
# Build an image that can do training and inference in SageMaker
# This is a Python 2 image that uses the nginx, gunicorn, flask stack
# for serving inferences in a stable way.
FROM ubuntu:16.04
MAINTAINER Amazon AI <sage-learner@amazon.com
RUN apt-get -y update && apt-get install -y --no-install-recommends \
wget \
python \
nginx \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
# Here we get all python packages.
# There's substantial overlap between scipy and numpy that we eliminate by
# linking them together. Likewise, pip leaves the install caches populated which uses
# a significant amount of space. These optimizations save a fair amount of space in the
# image, which reduces start up time. RUN wget https://bootstrap.pypa.io/get-pip.py && python get-pip.py && \
pip install numpy==1.16.2 scipy==1.2.1 scikit-learn==0.20.2 pandas flask gevent gunicorn && \
(cd /usr/local/lib/python2.7/dist-packages/scipy/.libs; rm *; ln ../../numpy/.libs/* .) && \
rm -rf /root/.cache
# Set some environment variables. PYTHONUNBUFFERED keeps Python from buffering our standard
# output stream, which means that logs can be delivered to the user quickly. PYTHONDONTWRITEBYTECODE
# keeps Python from writing the .pyc files which are unnecessary in this case. We also update
# PATH so that the train and serve programs are found when the container is invoked.
ENV PYTHONUNBUFFERED=TRUE ENV PYTHONDONTWRITEBYTECODE=TRUE ENV
PATH="/opt/program:${PATH}"
# Set up the program in the image COPY decision_trees /opt/program WORKDIR /opt/program
最佳答案
训练文件夹名称取决于您在 CreateTrainingJob 操作中提供的 InputDataConfig: https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateTrainingJob.html#SageMaker-CreateTrainingJob-request-InputDataConfig
如果 channel 名称是“xyz”,它将在所述位置创建一个同名文件夹 (/opt/ml/input/data/xyz)
关于amazon-web-services - SageMaker Train 作业未创建/opt/ml/input/data/training 目录,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55115686/