swift - 单个 Vision/CoreML 推理的多个 CALayer 更新

标签 swift avcapturesession coreml avkit apple-vision

我正在尝试使用 Vision 和 CoreML 以尽可能接近实时的方式对跟踪对象执行样式转换。我正在使用 AVKit 捕获视频,并使用 AVCaptureVideoDataOutputSampleBufferDelegate 获取每一帧。

在高层次上,我的管道是:

1)检测人脸

2) 更新预览层以在适当的屏幕位置绘制边界框

3) 将原始图像裁剪为检测到的人脸

4) 通过coreML模型运行人脸图像,得到新图像作为输出

5) 用新图像填充预览层(无论它们在哪里)

我希望在计算边界框(在主线程上)后立即放置边界框,然后在推理完成后填充它们。但是,我发现将 coreML 推理添加到管道(在 AVCaptureOutputQueue 或 CoreMLQueue 上)时,边界框在推理完成之前不会更新位置。也许我遗漏了一些关于如何在闭包中处理队列的信息。代码的(希望)相关部分如下。

我正在修改 https://developer.apple.com/documentation/vision/tracking_the_user_s_face_in_real_time 中的代码.

public func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer,
    from connection: AVCaptureConnection) {
    // omitting stuff that gets pixelBuffers etc formatted for use with Vision
    // and sets up tracking requests

    // Perform landmark detection on tracked faces
    for trackingRequest in newTrackingRequests {

        let faceLandmarksRequest = VNDetectFaceLandmarksRequest(completionHandler: { (request, error) in

            guard let landmarksRequest = request as? VNDetectFaceLandmarksRequest,
                let results = landmarksRequest.results as? [VNFaceObservation] else {
                    return
            }

            // Perform all UI updates (drawing) on the main queue,
            //not the background queue on which this handler is being called.

            DispatchQueue.main.async {
                self.drawFaceObservations(results) //<<- places bounding box on the preview layer
            }

            CoreMLQueue.async{ //Queue for coreML uses

                //get region of picture to crop for CoreML
                let boundingBox = results[0].boundingBox 

                //crop the input frame to the detected object
                let image: CVPixelBuffer = self.cropFrame(pixelBuffer: pixelBuffer, region: boundingBox)

                //infer on region
                let styleImage: CGImage = self.performCoreMLInference(on: image)

                //on the main thread, place styleImage into the bounding box(CAShapeLayer)
                DispatchQueue.main.async{
                    self.boundingBoxOverlayLayer?.contents = styleImage
                }
            }
        })

        do {
            try requestHandler.perform(faceLandmarksRequest)
        } catch let error as NSError {
            NSLog("Failed Request: %@", error)
        }
    }
}

除了队列/同步问题之外,我认为导致速度变慢的一个原因可能是将像素缓冲区裁剪到感兴趣的区域。我在这里没有想法,任何帮助将不胜感激

最佳答案

我正在使用 https://github.com/maxvol/RxAVFoundation 的管道和 https://github.com/maxvol/RxVision解决同步问题。

一个基本的例子-

let textRequest: RxVNDetectTextRectanglesRequest<CVPixelBuffer> = VNDetectTextRectanglesRequest.rx.request(reportCharacterBoxes: true)
var session = AVCaptureSession.rx.session()
var requests = [RxVNRequest<CVPixelBuffer>]()

self.requests = [self.textRequest]
self
  .textRequest
  .observable
  .observeOn(Scheduler.main)
  .subscribe { [unowned self] (event) in
      switch event {
      case .next(let completion):
              self.detectTextHandler(value: completion.value, request: completion.request, error: completion.error)
          default:
          break
      }
  }
  .disposed(by: disposeBag)

self.session
  .flatMapLatest { [unowned self] (session) -> Observable<CaptureOutput> in
      let imageLayer = session.previewLayer
      imageLayer.frame = self.imageView.bounds
      self.imageView.layer.addSublayer(imageLayer)
      return session.captureOutput
  }
  .subscribe { [unowned self] (event) in
      switch event {
      case .next(let captureOutput):
          guard let pixelBuffer = CMSampleBufferGetImageBuffer(captureOutput.sampleBuffer) else {
              return
          }
          var requestOptions: [VNImageOption: Any] = [:]
          if let camData = CMGetAttachment(captureOutput.sampleBuffer, key: kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, attachmentModeOut: nil) {
              requestOptions = [.cameraIntrinsics: camData]
          }
          let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: .up, options: requestOptions)
          do {
              try imageRequestHandler.rx.perform(self.requests, with: pixelBuffer)
          } catch {
              os_log("error: %@", "\(error)")
          }
          break
      case .error(let error):
          os_log("error: %@", "\(error)")
          break
      case .completed:
          // never happens
          break
      }
  }
  .disposed(by: disposeBag)

关于swift - 单个 Vision/CoreML 推理的多个 CALayer 更新,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51390204/

相关文章:

ios - 录制 iOS 时更改视频的最大持续时间

ios - 在 CoreML VNCoreMLRequest 之前触发时 ActivityIndi​​cator 不显示

keras -> mlmodel : coreml object has no attribute 'convert'

swift - 从一种 promise 类型转换为另一种 promise 类型

ios - 如何快速使用 Alamofire 从 API 获取空数据?

ios4 - 如何清理 applicationDidEnterBackground 中的 AVCaptureSession?

ios - 如何检查 AVCaptureDeviceFormat 是否支持慢动作录制?

ios - 准备 CoreML 模型时出错 : "<something>" is not supported for CoreML code generation

ios - Swift 委托(delegate)和可选

ios - 在 SwiftUI 中滑动列表时隐藏导航栏