我正在使用TinySoundFont在 watchOS 上使用 SF2 文件。我想实时播放框架生成的原始音频(这意味着按下相应按钮后立即调用 tsf_note_on 并在新数据出现后立即调用 tsf_render_short )需要)。我正在使用 AVAudioSourceNode 来实现这一点。

尽管当我将其渲染到文件中时声音渲染得很好,但使用 AVAudioSourceNode 播放时确实很吵。 (基于 the answer from Rob Napier ,这可能是因为我忽略了时间戳属性 - 我正在寻找解决该问题的解决方案。)是什么导致了此问题以及如何修复它?


import SwiftUI
import AVFoundation

struct ContentView: View {
    @ObservedObject var settings = Settings.shared

    init() {

    var body: some View {
        Button("Play Sound") {

            if !settings.engine.isRunning {
                do {
                    try settings.engine.start()
                } catch {


import SwiftUI
import AVFoundation

class Settings: ObservableObject {
    static let shared = Settings()

    var engine: AVAudioEngine!
    var sourceNode: AVAudioSourceNode!

    var tinySoundFont: OpaquePointer!

    func prepare() {
        let soundFontPath = Bundle.main.path(forResource: "GMGSx", ofType: "sf2")
        tinySoundFont = tsf_load_filename(soundFontPath)
        tsf_set_output(tinySoundFont, TSF_MONO, 44100, 0)


    func setUpSound() {
        if let engine = engine,
           let sourceNode = sourceNode {

        engine = .init()

        let mixerNode = engine.mainMixerNode

        let audioFormat = AVAudioFormat(
            commonFormat: .pcmFormatInt16,
            sampleRate: 44100,
            channels: 1,
            interleaved: false

        guard let audioFormat = audioFormat else {

        sourceNode = AVAudioSourceNode(format: audioFormat) { silence, timeStamp, frameCount, audioBufferList in
            guard let data = self.getSound(length: Int(frameCount)) else {
                return 1

            let ablPointer = UnsafeMutableAudioBufferListPointer(audioBufferList)

            data.withUnsafeBytes { (intPointer: UnsafePointer<Int16>) in
                for index in 0 ..< Int(frameCount) {
                    let value = intPointer[index]

                    // Set the same value on all channels (due to the inputFormat, there's only one channel though).
                    for buffer in ablPointer {
                        let buf: UnsafeMutableBufferPointer<Int16> = UnsafeMutableBufferPointer(buffer)
                        buf[index] = value

            return noErr

        engine.connect(sourceNode, to: mixerNode, format: audioFormat)

        do {
            try AVAudioSession.sharedInstance().setCategory(.playback)
        } catch {

    func playSound() {
        tsf_note_on(tinySoundFont, 0, 60, 1)

    func getSound(length: Int) -> Data? {
        let array = [Int16]()
        var storage = UnsafeMutablePointer<Int16>.allocate(capacity: length)
        storage.initialize(from: array, count: length)

        tsf_render_short(tinySoundFont, storage, Int32(length), 0)
        let data = Data(bytes: storage, count: length)


        return data


AVAudioSourceNode 初始值设定项采用渲染 block 。在您使用的模式(实时播放)中,这是一个实时回调,因此您有一个非常紧迫的期限来用请求的数据填充 block 并返回它以便可以播放。您没有大量时间进行计算。您肯定没有时间访问文件系统。

在您的 block 中,您在每个渲染周期重新计算整个 WAV,然后将其写入磁盘,然后从磁盘读取它,然后填充所请求的 block 。您忽略请求的时间戳,并始终从样本零开始填充缓冲区。不匹配是引起嗡嗡声的原因。事实上,你处理得太慢可能是导致音调下降的原因。

根据文件的大小,实现此目的的最简单方法是首先将所有内容解码到内存中,然后填充请求的时间戳和长度的缓冲区。看起来您的 C 代码已经生成了 PCM 数据,因此无需将其转换为 WAV 文件。它似乎已经是正确的格式。

Apple 为 Signal Generator 提供了一个很好的示例项目您应该将其用作起点。下载它并确保它按预期工作。然后交换您的 SF2 代码。您可能还会发现有关此内容的视频很有帮助:What’s New in AVAudioEngine .

这里使用的最简单的工具可能是 AVAudioPlayerNode。您的 SoundFontHelper 使事情变得更加复杂,因此我将其删除并直接从 Swift 调用 TSF。为此,请创建一个名为 tsf.c 的文件,如下所示:

#include "tsf.h"


#import "tsf.h"

将 ContentView 简化为:

import SwiftUI

struct ContentView: View {
    @ObservedObject var settings = Settings.shared

    init() {
        // You'll want error handling here.
        try! settings.prepare()

    var body: some View {
        Button("Play Sound") {


import SwiftUI
import AVFoundation

class Settings: ObservableObject {
    static let shared = Settings()

    var engine = AVAudioEngine()
    let playerNode = AVAudioPlayerNode()
    var tsf: OpaquePointer
    var outputFormat = AVAudioFormat()

    init() {
        let soundFontPath = Bundle.main.path(forResource: "GMGSx", ofType: "sf2")
        tsf = tsf_load_filename(soundFontPath)

        engine.connect(playerNode, to: engine.mainMixerNode, format: nil)


    // For simplicity, this object assumes the outputFormat does not change during its lifetime.
    // It's important to watch for route changes, and recreate this object if they occur. For details, see:
    // https://developer.apple.com/documentation/avfaudio/avaudiosession/responding_to_audio_session_route_changes
    func updateOutputFormat() {
        outputFormat = engine.mainMixerNode.outputFormat(forBus: 0)

    func prepare() throws {
        // Start the engine
        try AVAudioSession.sharedInstance().setCategory(.playback)
        try engine.start()


        // Configure TSF. The only important thing here is the sample rate, which can be different on different hardware.
        // Core Audio has a defined format of "deinterleaved 32-bit floating point."
                       TSF_STEREO_UNWEAVED,            // mode
                       Int32(outputFormat.sampleRate), // sampleRate
                       0)                              // gain

    func play() {
                    0,   // preset_index
                    60,  // key (middle C)
                    1.0) // velocity

        // These tones have a long falloff, so you want a lot of source data. This is 10s.
        let frameCount = 10 * Int(outputFormat.sampleRate)

        // Create a buffer for the samples
        let buffer = AVAudioPCMBuffer(pcmFormat: outputFormat, frameCapacity: AVAudioFrameCount(frameCount))!
        buffer.frameLength = buffer.frameCapacity

        // Render the samples. Do not mix. This buffer has been extended to
        // the needed size by the assignment to `frameLength` above. The call to
        // `assumingMemoryBound` is known to be correct because the format is Float32.
        let ptr = buffer.audioBufferList.pointee.mBuffers.mData?.assumingMemoryBound(to: Float.self)
                         ptr,                // buffer
                         Int32(frameCount),  // samples
                         0)                  // mixing (do not mix)

        // All done. Play the buffer, interrupting whatever is currently playing
        playerNode.scheduleBuffer(buffer, at: nil, options: .interrupts)

您可以在 my fork 找到完整版本。您还可以看到第一个提交,这是维护 SoundFontHelper 并进行转换来处理它的另一种方法,但首先正确渲染音频要简单得多。

