h.264 - 使用 VTDecompressionSessionDecodeFrame 和 H264 NAL 单元时的 kVTVideoDecoderBadDataErr

我的代码可以成功提取 H264 流的所有 NAL 单元，并将其打包到 Avi 文件中。我还可以解析 SPS、PPS 以及 NAL 单元类型 1 和 5。然后，我提取整个 GOP(图片组)，从 SPS 和 PPS 开始，然后是 IDR NAL 单元，最后以最后一个非 IDR NAL 结束下一个 SPS 之前的单元。

然后，我根据规范 8.2 重新排序 NAL 单元，以获得正确的 PicOrderCnt(将其作为播放时间戳 PTS 传递)。

所以我有一个包含 GOP 的 NAL 单元的数组和第二个包含 NAL 单元的 PTS 的数组。

然后，我将起始代码更改为 AVC 格式(NAL 单元的长度)。

之后，我将不带起始代码的 SPS 和 PPS 放入 CMVideoFormatDescriptionCreateFromH264ParameterSets。

之后，我将所有 NAL 单元([UInt8]-数组)放入 CMSampleBuffer，使用 PicOrderCnt 作为 CMSampleTimingInfo。

然后，我的代码在使用 VTDecompressionSessionDecodeFrame 时成功解码视频帧。

不幸的是，某些 GOP 不起作用，对于某些帧，我收到错误 kVTVideoDecoderBadDataErr 并且我无法解释原因。

例如:我的实际组是从DTS(解码时间戳)770开始的，什么是关键帧。这是我的调试打印:

DTS: 770    | PTS: 771  | NAL-Type 5: frame_num: 0 // slice_type: 7 // pic_order_cnt_lsb: 0     |   PicOrderCnt: 0
DTS: 771    | PTS: 773  | NAL-Type 1: frame_num: 1 // slice_type: 5 // pic_order_cnt_lsb: 4     |   PicOrderCnt: 4
DTS: 772    | PTS: 772  | NAL-Type 1: frame_num: 2 // slice_type: 6 // pic_order_cnt_lsb: 2     |   PicOrderCnt: 2
DTS: 773    | PTS: 776  | NAL-Type 1: frame_num: 2 // slice_type: 5 // pic_order_cnt_lsb: 10    |   PicOrderCnt: 10
DTS: 774    | PTS: 775  | NAL-Type 1: frame_num: 3 // slice_type: 6 // pic_order_cnt_lsb: 8     |   PicOrderCnt: 8
DTS: 775    | PTS: 774  | NAL-Type 1: frame_num: 4 // slice_type: 6 // pic_order_cnt_lsb: 6     |   PicOrderCnt: 6
DTS: 776    | PTS: 779  | NAL-Type 1: frame_num: 4 // slice_type: 5 // pic_order_cnt_lsb: 16    |   PicOrderCnt: 16

...

DTS: 815    | PTS: 818  | NAL-Type 1: frame_num: 14 // slice_type: 5 // pic_order_cnt_lsb: 30   |   PicOrderCnt: 94
DTS: 816    | PTS: 817  | NAL-Type 1: frame_num: 15 // slice_type: 6 // pic_order_cnt_lsb: 28   |   PicOrderCnt: 92
DTS: 817    | PTS: 816  | NAL-Type 1: frame_num: 0 // slice_type: 6 // pic_order_cnt_lsb: 26    |   PicOrderCnt: 90
DTS: 818    | PTS: 821  | NAL-Type 1: frame_num: 0 // slice_type: 5 // pic_order_cnt_lsb: 36    |   PicOrderCnt: 100
DTS: 819    | PTS: 820  | NAL-Type 1: frame_num: 1 // slice_type: 6 // pic_order_cnt_lsb: 34    |   PicOrderCnt: 98
DTS: 820    | PTS: 819  | NAL-Type 1: frame_num: 2 // slice_type: 6 // pic_order_cnt_lsb: 32    |   PicOrderCnt: 96
DTS: 821    | PTS: 824  | NAL-Type 1: frame_num: 2 // slice_type: 5 // pic_order_cnt_lsb: 42    |   PicOrderCnt: 106
DTS: 822    | PTS: 823  | NAL-Type 1: frame_num: 3 // slice_type: 6 // pic_order_cnt_lsb: 40    |   PicOrderCnt: 104
DTS: 823    | PTS: 822  | NAL-Type 1: frame_num: 4 // slice_type: 6 // pic_order_cnt_lsb: 38    |   PicOrderCnt: 102
DTS: 824    | PTS: 827  | NAL-Type 1: frame_num: 4 // slice_type: 5 // pic_order_cnt_lsb: 48    |   PicOrderCnt: 112
DTS: 825    | PTS: 826  | NAL-Type 1: frame_num: 5 // slice_type: 6 // pic_order_cnt_lsb: 46    |   PicOrderCnt: 110

我从 PTS 819 开始遇到错误。

这是我的代码:

func decodeGroup(_ group: AviH264Analyzer.GOP, fps: Double) {
    DispatchQueue(label: "decode").async {
                                                    
        let sps = group.spsNAL.bytesWithoutStartCode
        let pps = group.ppsNAL.bytesWithoutStartCode
        
        var formatDesc: CMVideoFormatDescription?
        
        var status = sps.withUnsafeBufferPointer { spsPtr in
            pps.withUnsafeBufferPointer { ppsPtr in
                let paramSet = [spsPtr.baseAddress!, ppsPtr.baseAddress!]
                let paramSizes = [sps.count, pps.count]
                return paramSet.withUnsafeBufferPointer { paramSetPtr in
                    paramSizes.withUnsafeBufferPointer { paramSizesPtr in
                        CMVideoFormatDescriptionCreateFromH264ParameterSets(allocator: nil,
                                                                            parameterSetCount: 2,
                                                                            parameterSetPointers: paramSetPtr.baseAddress!,
                                                                            parameterSetSizes: paramSizesPtr.baseAddress!,
                                                                            nalUnitHeaderLength: 4,
                                                                            formatDescriptionOut: &formatDesc)
                    }
                }
                
            }
        }
        
        var callback = VTDecompressionOutputCallbackRecord()
        callback.decompressionOutputCallback = { (_, _, status, _, imageBuffer, pts, _) in
            if let imageBuffer {
                let ciImage = CIImage(cvImageBuffer: imageBuffer)
                if let cgImage = CIContext().createCGImage(ciImage, from: ciImage.extent)  {
                    let rep = NSBitmapImageRep(cgImage: cgImage)
                    if let imgData = rep.representation(using: .png, properties: [:]), let nsImage = NSImage(data: imgData) {
                        let frameNumber = Int(pts.value)-1
                        if !VideoBuffer.shared.buffer.map({ $0.frameNumber }).contains(frameNumber) {
                            VideoBuffer.shared.buffer.append(VideoFrame(frameNumber: frameNumber, image: nsImage))
                        }
                    }
                }
            }
        }
        
        let decoderParameters = NSMutableDictionary()
        let destinationPixelBufferAttributes = NSMutableDictionary()
        destinationPixelBufferAttributes.setValue(
            NSNumber(value: kCVPixelFormatType_32ARGB),
            forKey: kCVPixelBufferPixelFormatTypeKey as String
        )
        
        var decompressionSession: VTDecompressionSession?
        status = VTDecompressionSessionCreate(allocator: kCFAllocatorDefault,
                                              formatDescription: formatDesc!,
                                              decoderSpecification: decoderParameters,
                                              imageBufferAttributes: destinationPixelBufferAttributes,
                                              outputCallback: &callback,
                                              decompressionSessionOut: &decompressionSession)
        if status != noErr {
            handleStatus(status)
        } else {
            print("DecompressionSession sucessfully created")
        }
            
        let nalus = group.nalus
        
        self.decodeNALUnits(nalus: nalus, 
                            order: group.order,
                            fps: fps,
                            formatDesc: formatDesc!,
                            decompressionSession: decompressionSession!)
    }
}

func decodeNALUnits(nalus: [PictureNAL], order: [Int], fps: Double, formatDesc: CMVideoFormatDescription, decompressionSession: VTDecompressionSession) {
    var videoData = [UInt8]()
    var sampleSizeArray = [Int]()
    for nalu in nalus {
        var bytes = nalu.bytesWithoutStartCode
        
        // the length of the NALU
        var bigLen = CFSwapInt32HostToBig(UInt32(bytes.count))
        bytes.insert(contentsOf: withUnsafeBytes(of: &bigLen, { Array($0) }), at: 0)
        videoData += bytes
        
        sampleSizeArray.append(bytes.count)
    }
    
    var blockBuffer: CMBlockBuffer?
    let count = videoData.count
    var status = videoData.withUnsafeMutableBufferPointer { bufferPointer in
        return CMBlockBufferCreateWithMemoryBlock(allocator: kCFAllocatorDefault,
                                                  memoryBlock: bufferPointer.baseAddress!,
                                                  blockLength: count,
                                                  blockAllocator: kCFAllocatorNull,
                                                  customBlockSource: nil,
                                                  offsetToData: 0,
                                                  dataLength: count,
                                                  flags: 0,
                                                  blockBufferOut: &blockBuffer)
    }
    
    if status != noErr {
        handleStatus(status)
    } else {
        print("CMBlockBufferCreateWithMemoryBlock sucess")
    }
    
    let frameDuration = CMTimeMake(value: 1, timescale: Int32(fps))
    
    var timing = [CMSampleTimingInfo]()
    
    for i in 0..<nalus.count {
        let pts = order[i]
        let presentationTime = CMTimeMake(value: Int64(pts), timescale: Int32(fps))
        let timingInfo = CMSampleTimingInfo(duration: frameDuration,
                                            presentationTimeStamp: presentationTime,
                                            decodeTimeStamp: CMTime.invalid)
        timing.append(timingInfo)
    }
    
    var sampleBuffer: CMSampleBuffer?
    status = CMSampleBufferCreateReady(allocator: kCFAllocatorDefault,
                                       dataBuffer: blockBuffer,
                                       formatDescription: formatDesc,
                                       sampleCount: sampleSizeArray.count,
                                       sampleTimingEntryCount: timing.count,
                                       sampleTimingArray: &timing,
                                       sampleSizeEntryCount: sampleSizeArray.count,
                                       sampleSizeArray: sampleSizeArray,
                                       sampleBufferOut: &sampleBuffer)
    if status != noErr {
        handleStatus(status)
    } else {
        print("CMSampleBufferCreateReady sucess")
    }
    
    guard let buffer = sampleBuffer else {
        print("Could not unwrap sampleBuffer!")
        return
    }
    
    var outputBuffer: CVPixelBuffer?
    
    status = VTDecompressionSessionDecodeFrame(decompressionSession,
                                               sampleBuffer: buffer,
                                               flags: [._EnableAsynchronousDecompression, ._EnableTemporalProcessing],
                                               frameRefcon: &outputBuffer,
                                               infoFlagsOut: nil)
    
    if status != noErr {
        print(status)
        handleStatus(status)
    } else {
        print("VTDecompressionSessionDecodeFrame sucess")
    }
}

最佳答案

我简直不敢相信......我通过将我自己的解析器中的 NAL 单元与 libav 解析的 NAL 单元进行比较找到了解决方案。我能发现的唯一区别是，我的解析器删除了模拟预防字节(H264 规范的 7.3.1)。经过一番尝试和错误，这是我的解决方案:

SPS 和 PPS 必须在不带模拟阻止字节(删除 0x000003 中的 0x03)且不带起始代码(0x000001 或 0x00000001)的情况下传递到 CMVideoFormatDescriptionCreateFromH264ParameterSets
VCL-nal 单元必须以 AVCC 格式并带有模拟预防字节传递到 VTDecompressionSessionDecodeFrame。

关于h.264 - 使用 VTDecompressionSessionDecodeFrame 和 H264 NAL 单元时的 kVTVideoDecoderBadDataErr，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/76281273/

h.264 - 使用 VTDecompressionSessionDecodeFrame 和 H264 NAL 单元时的 kVTVideoDecoderBadDataErr

上一篇：c# - 如何在 MAUI 中显示 YT 视频？

下一篇：p5.js - 为什么图形对象不渲染图像？