python - 如何使用 NumPy 将 sRGB 格式转换为 NV12 格式？

NV12 格式定义了具有 420 子采样的 YUV 颜色空间的特定颜色 channel 排序。
NV12 格式主要用于视频编码/解码流水线。

NV12 is a biplanar format with a full sized Y plane followed by a single chroma plane with weaved U and V values. NV21 is the same but with weaved V and U values. The 12 in NV12 refers to 12 bits per pixel. NV12 has a half width and half height chroma channel, and therefore is a 420 subsampling.

在NV12的上下文中，YUV格式主要是指YCbCr色彩空间。
NV12 元素是每个元素 8 位(uint8 类型)。
在帖子的上下文中，YUV 元素处于“有限范围”标准中:Y 范围是 [16, 235]，U,V 范围是 [16, 240]。

sRGB(标准红绿蓝)是 PC 系统使用的标准色彩空间。
在帖子的上下文中，sRGB颜色分量范围是 [0, 255](uint8 类型)。
RGB 元素排序与帖子无关(假设有 3 个颜色平面)。

目前至少有 2 种可能的 YCbCr 格式应用 NV12:

BT.601 - 应用标清电视。
BT.709 - 适用于高清电视。

NV12 元素排序示例:
YYYYYY
YYYYYY
紫外线UV紫外线

RGB 到 NV12 的转换可分为以下阶段:

色彩空间转换 - 从 sRGB 色彩空间转换为 YUV 色彩空间。
色度下采样 - 在每个轴上将 U、V channel 缩小 x2(从 YUV444 转换为 YUV420)。
色度元素交错 - 将 U、V 元素排列为 U、V、U、V...

下图说明了应用 6x6 像素图像大小的转换阶段:

我们如何使用 NumPy 将 sRGB 转换为 NV12？

注意:
该问题涉及演示转换过程的 Python 实现(帖子不适用于 OpenCV 实现等现有功能)。

最佳答案

使用 NumPy 将 sRGB 格式转换为 NV12 格式

帖子的目的是演示转换过程。
下面的 Python 实现使用 NumPy，并有意避免使用 OpenCV。

RGB 到 NV12 的转换阶段:

色彩空间转换 - 从 sRGB 色彩空间转换为 YUV 色彩空间:
使用 sRGB 到 YCbCr 的转换公式。
将每个 RGB 三元组乘以 3x3 转换矩阵，并添加一个包含 3 个偏移量的向量。
该帖子显示了 BT.709 和 BT.601 转换(唯一的区别是系数矩阵)。
色度下采样 - 在每个轴上将 U、V channel 缩小 x2(从 YUV444 转换为 YUV420)。
该实现使用双线性插值在每个轴上将 U、V 的大小调整为 0.5。
注意:双线性插值不是最佳的下采样方法，但通常已经足够了。
代码不使用 cv2.resize，而是使用每 2x2 像素的平均值(结果相当于双线性插值)。
注意:如果输入分辨率在两个维度上均不均匀，则实现失败。
色度元素交错——将 U、V 元素排列为 U、V、U、V...
通过数组索引操作实现。

这是将 RGB 转换为 NV12 标准的 Python 代码示例:

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import subprocess as sp  # The module is used for testing (using FFmpeg as reference).

do_use_bt709 = True  # True for BT.709, False for BT.601

rgb = mpimg.imread('rgb_input.png')*255.0   # Read RGB input image, multiply by 255 (set RGB range to [0, 255]).
r, g, b = np.squeeze(np.split(rgb, 3, -1))  # Split RGB to R, G and B numpy arrays.
rows, cols = r.shape

# I. Convert RGB to YUV (convert sRGB to YUV444)
#################################################
if do_use_bt709:
    # Convert sRGB to YUV, BT.709 standard
    # Conversion formula used: 8 bit sRGB to "limited range" 8 bit YUV (BT.709).            
    y =  0.1826*r + 0.6142*g + 0.0620*b + 16
    u = -0.1006*r - 0.3386*g + 0.4392*b + 128
    v =  0.4392*r - 0.3989*g - 0.0403*b + 128
else:
    # Convert sRGB to YUV, BT.601 standard.
    # Conversion formula used: 8 bit sRGB to "limited range" 8 bit YUV (BT.601).
    y =  0.2568*r + 0.5041*g + 0.0979*b + 16
    u = -0.1482*r - 0.2910*g + 0.4392*b + 128
    v =  0.4392*r - 0.3678*g - 0.0714*b + 128


# II. U,V Downscaling (convert YUV444 to YUV420)
##################################################
# Shrink U and V channels by a factor of x2 in each axis (use bi-linear interpolation).
#shrunk_u = cv2.resize(u, (cols//2, rows//2), interpolation=cv2.INTER_LINEAR)
#shrunk_v = cv2.resize(v, (cols//2, rows//2), interpolation=cv2.INTER_LINEAR)

# Each element of shrunkU is the mean of 2x2 elements of U
# Result is equivalent to resize by a factor of 0.5 with bi-linear interpolation.
shrunk_u = (u[0::2, 0::2] + u[1::2, 0::2] + u[0::2, 1::2] + u[1::2, 1::2]) * 0.25
shrunk_v = (v[0::2, 0::2] + v[1::2, 0::2] + v[0::2, 1::2] + v[1::2, 1::2]) * 0.25


# III. U,V Interleaving
########################
# Size of UV plane is half the number of rows, and same number of columns as Y plane.
uv = np.zeros((rows//2, cols))  # Use // for integer division.

# Interleave shrunkU and shrunkV and build UV plane (each row of UV plane is u,v,u,u,v...)
uv[:, 0::2] = shrunk_u
uv[:, 1::2] = shrunk_v

# Place Y plane at the top, and UV plane at the bottom (number of rows NV12 matrix is rows*1.5)
nv12 = np.vstack((y, uv))

# Round NV12, and cast to uint8.
nv12 = np.round(nv12).astype('uint8')

# Write NV12 array to binary file
nv12.tofile('nv12_output.raw')

# Display NV12 result (display as Grayscale image).
plt.figure()
plt.axis('off')
plt.imshow(nv12, cmap='gray', interpolation='nearest')
plt.show()


# Testing - compare the NV12 result to FFmpeg conversion result:
################################################################################
color_matrix = 'bt709' if do_use_bt709 else 'bt601'

sp.run(['ffmpeg', '-y', '-i', 'rgb_input.png', '-vf', 
        f'scale=flags=fast_bilinear:out_color_matrix={color_matrix}:out_range=tv:dst_format=nv12',
        '-pix_fmt', 'nv12', '-f', 'rawvideo', 'nv12_ffmpeg.raw'])

nv12_ff = np.fromfile('nv12_ffmpeg.raw', np.uint8)
nv12_ff = nv12_ff.reshape(nv12.shape)

abs_diff = np.absolute(nv12.astype(np.int16) - nv12_ff.astype(np.int16)).astype(np.uint8)
max_abs_diff = abs_diff.max()

print(f'max_abs_diff = {max_abs_diff}')

plt.figure()
plt.axis('off')
plt.imshow(abs_diff, cmap='gray', interpolation='nearest')
plt.show()
################################################################################

示例 RGB 输入图像:

NV12 结果(显示为灰度图像):

测试:

为了测试，我们使用 FFmpeg 将相同的输入图像 (rgb_input.png) 转换为 NV12 格式(命令行工具)，并计算两次转换之间的最大绝对差值。

测试假设FFmpeg在执行路径中(在Windows中我们可以将ffmpeg.exe放在与Python脚本相同的文件夹中)。

以下 shell 命令，将 rgb_input.png 转换为具有 BT.709 颜色标准的 NV12 格式:

ffmpeg -y -i rgb_input.png -vf "scale=flags=fast_bilinear:out_color_matrix=bt709:out_range=tv:dst_format=nv12"-pix_fmt nv12 -f rawvideo nv12_ffmpeg.raw/p>

注意:
fast_bilinear 插值针对特定输入图像提供最佳结果 - 在缩小 U 和 V 时应用双线性插值。

以下 Python 代码将 nv12_ffmpeg.raw 与 nv12_ffmpeg.raw 进行比较:

nv12_ff = np.fromfile('nv12_ffmpeg.raw', np.uint8).reshape(nv12.shape)
abs_diff = np.absolute(nv12.astype(np.int16) - nv12_ff.astype(np.int16)).astype(np.uint8)
print(f'max_abs_diff = {abs_diff.max()}')

对于特定的输入图像，最大差异是2或3(几乎相同)。
对于其他输入图像，差异更大(可能是由于错误的 FFmpeg 参数)。

关于python - 如何使用 NumPy 将 sRGB 格式转换为 NV12 格式？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57022633/

python - 如何使用 NumPy 将 sRGB 格式转换为 NV12 格式？

使用 NumPy 将 sRGB 格式转换为 NV12 格式

测试:

上一篇：python - 如何从数组中提取数字？

下一篇：python - 从具有相同索引的矩阵生成列表