我正在尝试使用 Vision 和 CoreML 以尽可能接近实时的方式对跟踪对象执行样式转换。我正在使用 AVKit 捕获视频,并使用 AVCaptureVideoDataOutputSampleBufferDelegate 获取每一帧。
在高层次上,我的管道是:
1)检测人脸
2) 更新预览层以在适当的屏幕位置绘制边界框
3) 将原始图像裁剪为检测到的人脸
4) 通过coreML模型运行人脸图像,得到新图像作为输出
5) 用新图像填充预览层(无论它们在哪里)
我希望在计算边界框(在主线程上)后立即放置边界框,然后在推理完成后填充它们。但是,我发现将 coreML 推理添加到管道(在 AVCaptureOutputQueue 或 CoreMLQueue 上)时,边界框在推理完成之前不会更新位置。也许我遗漏了一些关于如何在闭包中处理队列的信息。代码的(希望)相关部分如下。
我正在修改 https://developer.apple.com/documentation/vision/tracking_the_user_s_face_in_real_time 中的代码.
public func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer,
from connection: AVCaptureConnection) {
// omitting stuff that gets pixelBuffers etc formatted for use with Vision
// and sets up tracking requests
// Perform landmark detection on tracked faces
for trackingRequest in newTrackingRequests {
let faceLandmarksRequest = VNDetectFaceLandmarksRequest(completionHandler: { (request, error) in
guard let landmarksRequest = request as? VNDetectFaceLandmarksRequest,
let results = landmarksRequest.results as? [VNFaceObservation] else {
return
}
// Perform all UI updates (drawing) on the main queue,
//not the background queue on which this handler is being called.
DispatchQueue.main.async {
self.drawFaceObservations(results) //<<- places bounding box on the preview layer
}
CoreMLQueue.async{ //Queue for coreML uses
//get region of picture to crop for CoreML
let boundingBox = results[0].boundingBox
//crop the input frame to the detected object
let image: CVPixelBuffer = self.cropFrame(pixelBuffer: pixelBuffer, region: boundingBox)
//infer on region
let styleImage: CGImage = self.performCoreMLInference(on: image)
//on the main thread, place styleImage into the bounding box(CAShapeLayer)
DispatchQueue.main.async{
self.boundingBoxOverlayLayer?.contents = styleImage
}
}
})
do {
try requestHandler.perform(faceLandmarksRequest)
} catch let error as NSError {
NSLog("Failed Request: %@", error)
}
}
}
除了队列/同步问题之外,我认为导致速度变慢的一个原因可能是将像素缓冲区裁剪到感兴趣的区域。我在这里没有想法,任何帮助将不胜感激
最佳答案
我正在使用 https://github.com/maxvol/RxAVFoundation 的管道和 https://github.com/maxvol/RxVision解决同步问题。
一个基本的例子-
let textRequest: RxVNDetectTextRectanglesRequest<CVPixelBuffer> = VNDetectTextRectanglesRequest.rx.request(reportCharacterBoxes: true)
var session = AVCaptureSession.rx.session()
var requests = [RxVNRequest<CVPixelBuffer>]()
self.requests = [self.textRequest]
self
.textRequest
.observable
.observeOn(Scheduler.main)
.subscribe { [unowned self] (event) in
switch event {
case .next(let completion):
self.detectTextHandler(value: completion.value, request: completion.request, error: completion.error)
default:
break
}
}
.disposed(by: disposeBag)
self.session
.flatMapLatest { [unowned self] (session) -> Observable<CaptureOutput> in
let imageLayer = session.previewLayer
imageLayer.frame = self.imageView.bounds
self.imageView.layer.addSublayer(imageLayer)
return session.captureOutput
}
.subscribe { [unowned self] (event) in
switch event {
case .next(let captureOutput):
guard let pixelBuffer = CMSampleBufferGetImageBuffer(captureOutput.sampleBuffer) else {
return
}
var requestOptions: [VNImageOption: Any] = [:]
if let camData = CMGetAttachment(captureOutput.sampleBuffer, key: kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, attachmentModeOut: nil) {
requestOptions = [.cameraIntrinsics: camData]
}
let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: .up, options: requestOptions)
do {
try imageRequestHandler.rx.perform(self.requests, with: pixelBuffer)
} catch {
os_log("error: %@", "\(error)")
}
break
case .error(let error):
os_log("error: %@", "\(error)")
break
case .completed:
// never happens
break
}
}
.disposed(by: disposeBag)
关于swift - 单个 Vision/CoreML 推理的多个 CALayer 更新,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51390204/