audio - 关于音频编解码器术语的定义

读书时Cocoa Audio Queue文档，我在音频编解码器中遇到了几个术语。在名为 AudioStreamBasicDescription 的结构中定义了.

以下是条款:
1. 采样率
2. 数据包
3. 框架
4. channel

我知道 sample rate和 channel .我是如何被其他两个弄糊涂的。另外两个术语是什么意思？

你也可以通过例子来回答这个问题。例如，我有一个采样率为 44.1kHz 的双 channel PCM-16 源，这意味着每秒有 2*44100 = 88200 字节的 PCM 数据。但是如何packet和 frame ?

提前谢谢你!

最佳答案

您已经熟悉采样率定义。
采样频率或采样率 fs 定义为一秒内获得的样本数(每秒样本数)，因此 fs = 1/T。
因此，对于 44100 Hz 的采样率，每秒(每个音频 channel )有 44100 个样本。

视频中每秒的帧数与音频中每秒的样本数是一个类似的概念。我们眼睛的框架，我们耳朵的 sample 。更多信息 here .

如果您有 16 位深度的立体声 PCM，则意味着您有 16*44100*2 = 1411200 位每秒 => ~ 172 kB 每秒 => 每分钟大约 10 MB。

到 reworded terms 中的定义来自苹果:

Sample: a single number representing the value of one audio channel at one point in time.
Frame: a group of one or more samples, with one sample for each channel, representing the audio on all channels at a single point on time.
Packet: a group of one or more frames, representing the audio format's smallest encoding unit, and the audio for all channels across a short amount of time.

如您所见，音频和视频帧概念之间存在细微差别。在一秒钟内，您将获得 44.1 kHz 的立体声音频:88200 个样本，因此 44100 帧。

MP3 和 AAC 等压缩格式将多个帧打包成数据包(然后可以将这些数据包写入 MP4 文件中，例如，它们可以与视频内容有效交错)。您了解处理大数据包有助于识别位模式以提高编码效率。

MP3, for example, uses packets of 1152 frames, which are the basic atomic unit of an MP3 stream. PCM audio is just a series of samples, so it can be divided down to the individual frame, and it really has no packet size at all.

对于 AAC，每个数据包可以有 1024(或 960)帧。这在您指向的 Apple 文档中有所描述:

The number of frames in a packet of audio data. For uncompressed audio, the value is 1. For variable bit-rate formats, the value is a larger fixed number, such as 1024 for AAC. For formats with a variable number of frames per packet, such as Ogg Vorbis, set this field to 0.

在基于 MPEG 的文件格式中，一个数据包被称为一个数据帧(不是
与之前的音频帧概念混合在一起)。有关该主题的更多信息，请参阅 Brad 评论。

关于audio - 关于音频编解码器术语的定义，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/23216103/

audio - 关于音频编解码器术语的定义

上一篇：sublimetext3 - Emmet 插件的扩展不适用于 Sublime Text 3

下一篇：r - 是否有包或技术可用于计算 R 中的大阶乘？