我正在尝试使用Docker,Tensorflow Serving和Heroku部署Tensorflow模型。一切正常,但是当TF服务容器结束初始化时(当它输出“进入事件循环”时),Heroku Web Dyno突然崩溃。然后,它重新启动并再次尝试,但是当它再次到达事件循环时,它崩溃了。第三次,Heroku根本再也没有旋转过测功机。
首先,我只是部署镜像,没问题:
C:\Users\whitm\Desktop\CodeProjects\deep-deblurring-serving>heroku container:release web
Releasing images web to deep-deblurring-serving... done
C:\Users\whitm\Desktop\CodeProjects\deep-deblurring-serving>heroku ps
Free dyno hours quota remaining this month: 550h 0m (100%)
Free dyno usage for this app: 0h 0m (0%)
For more information on dyno sleeping and how to upgrade, see:
https://devcenter.heroku.com/articles/dyno-sleeping
=== web (Free): /usr/bin/tf_serving_entrypoint.sh (1)
web.1: starting 2020/04/10 15:36:38 -0400 (~ 6s ago)
初始化一分钟后,(当tf服务到达事件循环时),dyno崩溃:
2020-04-10T19:36:53.234387+00:00 app[web.1]: [evhttp_server.cc : 238] NET_LOG: Entering the event loop ...
2020-04-10T19:36:53.234389+00:00 app[web.1]: 2020-04-10 19:36:53.234341: I tensorflow_serving/model_servers/server.cc:378] Exporting HTTP/REST API at:localhost:8501 ..
.
2020-04-10T19:37:46.597354+00:00 heroku[web.1]: State changed from starting to crashed
2020-04-10T19:37:46.602976+00:00 heroku[web.1]: State changed from crashed to starting
然后,它由Heroku自动重新启动。
C:\Users\whitm\Desktop\CodeProjects\deep-deblurring-serving>heroku ps
Free dyno hours quota remaining this month: 550h 0m (100%)
Free dyno usage for this app: 0h 0m (0%)
For more information on dyno sleeping and how to upgrade, see:
https://devcenter.heroku.com/articles/dyno-sleeping
=== web (Free): /usr/bin/tf_serving_entrypoint.sh (1)
web.1: restarting 2020/04/10 15:37:46 -0400 (~ 45s ago)
该循环持续进行三次,最后一次,Heroku停止重新启动测功机:
C:\Users\whitm\Desktop\CodeProjects\deep-deblurring-serving>heroku ps
Free dyno hours quota remaining this month: 550h 0m (100%)
Free dyno usage for this app: 0h 0m (0%)
For more information on dyno sleeping and how to upgrade, see:
https://devcenter.heroku.com/articles/dyno-sleeping
=== web (Free): /usr/bin/tf_serving_entrypoint.sh (1)
web.1: crashed 2020/04/10 15:38:53 -0400 (~ 3m ago)
这不是容器的问题,它在本地就像一个咒语一样工作,到达事件循环并开始侦听传入的请求。我可以毫无问题地提出要求。所以问题出在Heroku,但我不知道发生了什么。我觉得这与将Heroku解释为无响应的应用程序有关吗?我不知道。最坏的情况是,如果dyno不在“running”状态,则无法通过SSH进入容器,该状态永远不会达到,因为它在初始化期间崩溃。
最后一件事,容器在本地使用448MB的RAM,而Heroku的免费Dynos则有500MB,我认为由于内存而崩溃,但是同样,我无法检查发生了什么。
我该怎么办,在哪里可以看到?
提前致谢!
PD:我尝试运行一个较轻的模型,该模型在本地使用20MB的RAM,但是在Heroku上结果是相同的,Dyno崩溃了。
最佳答案
我解决了问题。这是由于容器端口不匹配引起的。基本上,Tensorflow Serving尝试将默认的8501端口用于其余的API,但实际上,Heroku分配了另一个端口来暴露容器。解决方案是告诉tensorFlow模型服务器并更新/usr/bin/tf_serving_entrypoint.sh
文件,以使用Heroku分配的端口。
这是新的Dockerfile:
FROM tensorflow/serving
LABEL maintainer="Whitman Bohorquez" description="Build tf serving based image. This repo must be used as build context"
COPY / /
RUN apt-get update && apt-get install -y git && git reset --hard
ENV MODEL_NAME=deblurrer MODEL_BASE_PATH=/models
RUN echo '#!/bin/bash \n\n\
tensorflow_model_server \
--rest_api_port=$PORT \
--model_name=${MODEL_NAME} \
--model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} \
"$@"' > /usr/bin/tf_serving_entrypoint.sh \
&& chmod +x /usr/bin/tf_serving_entrypoint.sh
# CMD is required to run on Heroku
CMD ["/usr/bin/tf_serving_entrypoint.sh"]
关于python - Tensorflow服务容器进入事件循环后Heroku Dyno崩溃,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61147674/