用于 NVIDIA opengl 应用程序的 Docker xserver(主机中没有 X)

标签 docker ubuntu opengl nvidia headless

我正在尝试使用 NVIDIA GPU 为 OpenGL headless (headless)应用程序创建运行 X 服务器的 Docker 镜像。 (可用于创建纹理、在无屏幕的情况下运行 Unity3D 等)。在这种情况下,主机没有运行 X 服务器,我想在容器内做所有事情。
我将这个 Dockerfile 用于图像:

FROM ubuntu:18.04
    
ENV DEBIAN_FRONTEND=noninteractive
    
RUN apt update && \
        apt install -y \
        libglvnd0 \
        libgl1 \
        libglx0 \
        libegl1 \
        libgles2 \
        xserver-xorg-video-nvidia-440    
    
COPY xorg.conf.nvidia-headless /etc/X11/xorg.conf

ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES graphics
ENV DISPLAY :1
    
    ENTRYPOINT ["/bin/bash"]
对于 xorg.config.nvidia-headless,我使用 nvidia-xconfig 创建了它
Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0"
EndSection

Section "Files"
EndSection

Section "Module"
    Load           "dbe"
    Load           "extmod"
    Load           "type1"
    Load           "freetype"
    Load           "glx"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Unknown"
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "UseDisplayDevice" "None"
    SubSection     "Display"
        Virtual     1920 1080
        Depth       24
    EndSubSection
EndSection
我使用 --privileged 和 --gpus 运行 docker,全部使用 nvidia-docker 并共享设备 --device --device=/dev/dri/card0。在 Docker 内部,我可以完美地运行 nvidia-smi。
当我运行 docker 时,我启动了一个 X 服务器
Xorg -noreset +extension GLX +extension RANDR +extension RENDER -logfile ./xserver.log vt1 :1
但它显示一个错误:
(EE) 
Fatal server error:
(EE) no screens found(EE) 
(EE) 
这是完整的日志:
X.Org X Server 1.19.6
Release Date: 2017-12-20
[  1296.109] X Protocol Version 11, Revision 0
[  1296.109] Build Operating System: Linux 4.4.0-168-generic x86_64 Ubuntu
[  1296.109] Current Operating System: Linux ubuntu 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64
[  1296.109] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.15.0-112-generic root=UUID=8f2dc01d-1666-4abd-9bd1-cfe0a20afdf1 ro splash quiet vt.handoff=1
[  1296.109] Build Date: 14 November 2019  06:20:00PM
[  1296.109] xorg-server 2:1.19.6-1ubuntu4.4 (For technical support please see http://www.ubuntu.com/support) 
[  1296.109] Current version of pixman: 0.34.0
[  1296.109]    Before reporting problems, check http://wiki.x.org
    to make sure that you have the latest version.
[  1296.109] Markers: (--) probed, (**) from config file, (==) default setting,
    (++) from command line, (!!) notice, (II) informational,
    (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[  1296.110] (++) Log file: "./xserver.log", Time: Wed Aug 19 08:38:46 2020
[  1296.110] (==) Using config file: "/etc/X11/xorg.conf"
[  1296.110] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[  1296.111] (==) ServerLayout "Layout0"
[  1296.111] (**) |-->Screen "Screen0" (0)
[  1296.111] (**) |   |-->Monitor "Monitor0"
[  1296.112] (**) |   |-->Device "Device0"
[  1296.112] (**) |-->Input Device "Keyboard0"
[  1296.112] (**) |-->Input Device "Mouse0"
[  1296.112] (==) Automatically adding devices
[  1296.112] (==) Automatically enabling devices
[  1296.112] (==) Automatically adding GPU devices
[  1296.112] (==) Automatically binding GPU devices
[  1296.112] (==) Max clients allowed: 256, resource mask: 0x1fffff
[  1296.114] (WW) The directory "/usr/share/fonts/X11/cyrillic" does not exist.
[  1296.114]    Entry deleted from font path.
[  1296.114] (WW) The directory "/usr/share/fonts/X11/100dpi/" does not exist.
[  1296.114]    Entry deleted from font path.
[  1296.114] (WW) The directory "/usr/share/fonts/X11/75dpi/" does not exist.
[  1296.114]    Entry deleted from font path.
[  1296.114] (WW) The directory "/usr/share/fonts/X11/Type1" does not exist.
[  1296.114]    Entry deleted from font path.
[  1296.114] (WW) The directory "/usr/share/fonts/X11/100dpi" does not exist.
[  1296.114]    Entry deleted from font path.
[  1296.114] (WW) The directory "/usr/share/fonts/X11/75dpi" does not exist.
[  1296.114]    Entry deleted from font path.
[  1296.114] (==) FontPath set to:
    /usr/share/fonts/X11/misc,
    built-ins
[  1296.114] (==) ModulePath set to "/usr/lib/xorg/modules"
[  1296.114] (WW) Hotplugging is on, devices using drivers 'kbd', 'mouse' or 'vmmouse' will be disabled.
[  1296.114] (WW) Disabling Keyboard0
[  1296.114] (WW) Disabling Mouse0
[  1296.115] (II) Loader magic: 0x55dca9edc020
[  1296.115] (II) Module ABI versions:
[  1296.115]    X.Org ANSI C Emulation: 0.4
[  1296.115]    X.Org Video Driver: 23.0
[  1296.115]    X.Org XInput driver : 24.1
[  1296.115]    X.Org Server Extension : 10.0
[  1296.116] (EE) dbus-core: error connecting to system bus: org.freedesktop.DBus.Error.FileNotFound (Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory)
[  1296.116] (++) using VT number 1

[  1296.116] (II) systemd-logind: logind integration requires -keeptty and -keeptty was not provided, disabling logind integration
[  1296.116] (II) xfree86: Adding drm device (/dev/dri/card0)
[  1296.119] (**) OutputClass "nvidia" ModulePath extended to "/usr/lib/x86_64-linux-gnu/nvidia/xorg,/usr/lib/xorg/modules"
[  1296.122] (--) PCI:*(0:1:0:0) 10de:100c:1043:84b7 rev 161, Mem @ 0xf9000000/16777216, 0xd0000000/134217728, 0xd8000000/33554432, I/O @ 0x0000e000/128, BIOS @ 0x????????/131072
[  1296.122] (II) LoadModule: "glx"
[  1296.123] (II) Loading /usr/lib/xorg/modules/extensions/libglx.so
[  1296.131] (EE) Failed to load /usr/lib/xorg/modules/extensions/libglx.so: /usr/lib/xorg/modules/extensions/libglx.so: undefined symbol: glxServer
[  1296.131] (II) UnloadModule: "glx"
[  1296.131] (II) Unloading glx
[  1296.131] (EE) Failed to load module "glx" (loader failed, 7)
[  1296.131] (II) LoadModule: "nvidia"
[  1296.131] (II) Loading /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so
[  1296.138] (II) Module nvidia: vendor="NVIDIA Corporation"
[  1296.139]    compiled for 1.6.99.901, module version = 1.0.0
[  1296.139]    Module class: X.Org Video Driver
[  1296.140] (II) NVIDIA dlloader X Driver  440.100  Fri May 29 08:21:27 UTC 2020
[  1296.140] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[  1296.141] (II) Loading sub module "fb"
[  1296.141] (II) LoadModule: "fb"
[  1296.141] (II) Loading /usr/lib/xorg/modules/libfb.so
[  1296.143] (II) Module fb: vendor="X.Org Foundation"
[  1296.143]    compiled for 1.19.6, module version = 1.0.0
[  1296.143]    ABI class: X.Org ANSI C Emulation, version 0.4
[  1296.143] (II) Loading sub module "wfb"
[  1296.143] (II) LoadModule: "wfb"
[  1296.143] (II) Loading /usr/lib/xorg/modules/libwfb.so
[  1296.144] (II) Module wfb: vendor="X.Org Foundation"
[  1296.144]    compiled for 1.19.6, module version = 1.0.0
[  1296.144]    ABI class: X.Org ANSI C Emulation, version 0.4
[  1296.144] (II) Loading sub module "ramdac"
[  1296.144] (II) LoadModule: "ramdac"
[  1296.144] (II) Module "ramdac" already built-in
[  1296.145] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the
[  1296.145] (EE) NVIDIA:     system's kernel log for additional error messages and
[  1296.145] (EE) NVIDIA:     consult the NVIDIA README for details.
[  1296.145] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the
[  1296.145] (EE) NVIDIA:     system's kernel log for additional error messages and
[  1296.145] (EE) NVIDIA:     consult the NVIDIA README for details.
[  1296.145] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the
[  1296.145] (EE) NVIDIA:     system's kernel log for additional error messages and
[  1296.145] (EE) NVIDIA:     consult the NVIDIA README for details.
[  1296.145] (EE) No devices detected.
[  1296.145] (II) Applying OutputClass "nvidia" to /dev/dri/card0
[  1296.145]    loading driver: nvidia
[  1296.145] (==) Matched nvidia as autoconfigured driver 0
[  1296.145] (==) Matched nouveau as autoconfigured driver 1
[  1296.145] (==) Matched nouveau as autoconfigured driver 2
[  1296.145] (==) Matched modesetting as autoconfigured driver 3
[  1296.145] (==) Matched fbdev as autoconfigured driver 4
[  1296.145] (==) Matched vesa as autoconfigured driver 5
[  1296.145] (==) Assigned the driver to the xf86ConfigLayout
[  1296.145] (II) LoadModule: "nvidia"
[  1296.145] (II) Loading /usr/lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so
[  1296.145] (II) Module nvidia: vendor="NVIDIA Corporation"
[  1296.145]    compiled for 1.6.99.901, module version = 1.0.0
[  1296.145]    Module class: X.Org Video Driver
[  1296.145] (II) UnloadModule: "nvidia"
[  1296.145] (II) Unloading nvidia
[  1296.145] (II) Failed to load module "nvidia" (already loaded, 21980)
[  1296.145] (II) LoadModule: "nouveau"
[  1296.146] (WW) Warning, couldn't open module nouveau
[  1296.146] (II) UnloadModule: "nouveau"
[  1296.146] (II) Unloading nouveau
[  1296.146] (EE) Failed to load module "nouveau" (module does not exist, 0)
[  1296.146] (II) LoadModule: "modesetting"
[  1296.146] (II) Loading /usr/lib/xorg/modules/drivers/modesetting_drv.so
[  1296.147] (II) Module modesetting: vendor="X.Org Foundation"
[  1296.147]    compiled for 1.19.6, module version = 1.19.6
[  1296.147]    Module class: X.Org Video Driver
[  1296.147]    ABI class: X.Org Video Driver, version 23.0
[  1296.147] (II) LoadModule: "fbdev"
[  1296.147] (WW) Warning, couldn't open module fbdev
[  1296.147] (II) UnloadModule: "fbdev"
[  1296.147] (II) Unloading fbdev
[  1296.147] (EE) Failed to load module "fbdev" (module does not exist, 0)
[  1296.147] (II) LoadModule: "vesa"
[  1296.147] (WW) Warning, couldn't open module vesa
[  1296.147] (II) UnloadModule: "vesa"
[  1296.147] (II) Unloading vesa
[  1296.147] (EE) Failed to load module "vesa" (module does not exist, 0)
[  1296.147] (II) NVIDIA dlloader X Driver  440.100  Fri May 29 08:21:27 UTC 2020
[  1296.147] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[  1296.147] (II) modesetting: Driver for Modesetting Kernel Drivers: kms
[  1296.147] (WW) xf86OpenConsole: setpgid failed: Operation not permitted
[  1296.147] (WW) xf86OpenConsole: setsid failed: Operation not permitted
[  1296.147] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the
[  1296.147] (EE) NVIDIA:     system's kernel log for additional error messages and
[  1296.147] (EE) NVIDIA:     consult the NVIDIA README for details.
[  1296.147] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the
[  1296.147] (EE) NVIDIA:     system's kernel log for additional error messages and
[  1296.147] (EE) NVIDIA:     consult the NVIDIA README for details.
[  1296.147] (WW) Falling back to old probe method for modesetting
[  1296.147] (EE) Screen 0 deleted because of no matching config section.
[  1296.147] (II) UnloadModule: "modesetting"
[  1296.147] (EE) Device(s) detected, but none match those in the config file.
[  1296.147] (EE) 
Fatal server error:
[  1296.147] (EE) no screens found(EE) 
[  1296.147] (EE) 
Please consult the The X.Org Foundation support 
     at http://wiki.x.org
 for help. 
[  1296.147] (EE) Please also check the log file at "./xserver.log" for additional information.
[  1296.147] (EE) 
[  1296.149] (EE) Server terminated with error (1). Closing log file.
有人可以帮我吗?这将在带有 NVIDIA GPU 的 headless (headless)机器上运行。

最佳答案

首先要做的事情:如果你想要 headless (headless) OpenGL,不要使用 X 服务器!
X 服务器需要与 GPU 通信已经有好几年了。如果没有,您可以很好地进行 headless (headless)渲染。 Nvidia 有一篇关于如何做到这一点的好文章:https://developer.nvidia.com/blog/egl-eye-opengl-visualization-without-x-server/
要点是,您使用 EGL 来设置上下文并通过调用 eglMakeCurrent(eglDpy, EGL_NO_SURFACE, EGL_NO_SURFACE, eglCtx); 使上下文在没有表面的情况下成为当前上下文。 .
您仍然需要用于 Xorg 的 Nvidia 驱动程序,因为它还带有所有屏幕外的东西,但有一个重要的警告:Nvidia 用户区驱动程序必须匹配主机系统 nvidia内核模块版本。如果您将驱动程序包装在 Docker 容器中,您实际上是将 Docker 镜像绑定(bind)到主机系统上的特定内核模块版本。不是一个理想的情况。相反,您应该配置您的 docker 镜像以从主机系统绑定(bind)驱动程序和 OpenGL 实现库。不幸的是,这些库和驱动程序的位置没有通用的位置,这意味着需要更多的努力才能将它们全部可靠地拉入。但不要绝望,Nvidia 已经为您完成了这项工作:
https://gitlab.com/nvidia/container-images/opengl
此外,为了可靠地设置屏幕外上下文,取消设置 DISPLAY 也很有帮助。变量:由于 Nvidia 刚刚在 Xorg 驱动程序之上构建了他们所有的 Vulkan 和 EGL 东西,所以有一些代码路径会评估该变量并取消设置它有助于将所有代码路径推向正确的方向。因此,在您的程序中,在设置 OpenGL 上下文之前执行 setenv("DISPLAY", NULL, 0) .

关于用于 NVIDIA opengl 应用程序的 Docker xserver(主机中没有 X),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63483222/

相关文章:

linux - 在 docker 上使用最新的 curl 版本

docker - 让 docker 不让程序将端口绑定(bind)到 localhost

bash - 使用终端卸载或禁用 Jenkins 插件

c - OpenGL - GLUT - 显示不同的弹出菜单

c++ - 覆盖 openGL 2d 纹理中的每个像素

docker - 如何正确使用 gitlab-runner exec docker?

docker - 无法在 Windows 上启动 docker

ubuntu - wkhtmltopdf:更新了我的 Ubuntu 服务器并得到了 "QXcbConnection: Could not connect to display"

linux - 在 OpenStack 上安装 Apache Hadoop

c++ - 使用 mingw32 在 windows 上构建 glew