通常将这个问题作为另一个问题的一部分,但事实证明答案很长。我决定在这里回答,以便可以在其他地方链接到它。
尽管目前我还不知道Java可以为我们生成音频样本的方式,但是如果将来发生变化,这可能是一个地方。我知道 JavaFX
有这样的东西,例如 AudioSpectrumListener
,但仍然不是直接访问样本的方法。
我正在使用javax.sound.sampled
进行播放和/或录制,但我想对音频做些事情。
也许我想以可视方式显示它或以某种方式对其进行处理。
如何使用Java Sound访问音频样本数据?
也可以看看:
最佳答案
好吧,最简单的答案是,目前Java无法为程序员提供示例数据。
This quote is from the official tutorial:
There are two ways to apply signal processing:
You can use any processing supported by the mixer or its component lines, by querying for
Control
objects and then setting the controls as the user desires. Typical controls supported by mixers and lines include gain, pan, and reverberation controls.If the kind of processing you need isn't provided by the mixer or its lines, your program can operate directly on the audio bytes, manipulating them as desired.
This page discusses the first technique in greater detail, because there is no special API for the second technique.
带有
javax.sound.sampled
的播放在很大程度上充当了文件和音频设备之间的桥梁。从文件中读取字节并将其发送出去。不要假设字节是有意义的音频样本!除非您碰巧有一个8位AIFF文件,否则它们不是。 (另一方面,如果样本确实是8位带符号的,则可以对它们进行算术运算。如果只是在玩弄,使用8位是避免此处描述的复杂性的一种方法。)
因此,我将枚举
AudioFormat.Encoding
的类型并描述如何自己对其进行解码。该答案将不涉及如何编码,但包含在底部的完整代码示例中。编码通常只是相反的解码过程。这是一个很长的答案,但我想给出一个全面的概述。
关于数字音频的一点
通常,当解释数字音频时,我们指的是Linear Pulse-Code Modulation(LPCM)。
以规则的间隔采样连续的声波,并将振幅量化为一定比例的整数。
这里显示的是一个采样并量化为4位的正弦波:
(请注意,two's complement表示形式中的最大正值比最大负值小1。这是需要注意的次要细节。例如,如果您剪辑音频而忘记了它,则肯定剪辑会溢出。)
当计算机上有音频时,我们将得到一系列这些样本。我们要将示例数组变成
byte
数组。为了解码PCM样本,我们不太在乎采样率或 channel 数,因此在这里我就不再赘述。 channel 通常是交错的,因此,如果我们有它们的数组,它们将像这样存储:
Index 0: Sample 0 (Left Channel)
Index 1: Sample 0 (Right Channel)
Index 2: Sample 1 (Left Channel)
Index 3: Sample 1 (Right Channel)
Index 4: Sample 2 (Left Channel)
Index 5: Sample 2 (Right Channel)
...
换句话说,对于立体声,阵列中的样本仅在左右之间交替。
一些假设
所有代码示例都将采用以下声明:
byte[] bytes;
从byte
中读取的AudioInputStream
数组。 float[] samples;
我们将要填充的输出样本数组。 float sample;
我们当前正在处理的示例。 long temp;
用于常规操作的临时值。 int i;
byte
数组中当前样本数据开始的位置。 我们将把
float[]
数组中的所有样本标准化为-1f <= sample <= 1f
的范围。我所见过的所有浮点音频都以这种方式发送,非常方便。如果我们的源音频还不是这样(例如整数样本),我们可以使用以下内容将其规范化:
sample = sample / fullScale(bitsPerSample);
其中
fullScale
为2bitsPerSample-1,即Math.pow(2, bitsPerSample-1)
。如何将
byte
数组强制转换为有意义的数据?byte
数组包含拆分后的示例帧,并且全部在一行中。实际上,这很简单,除了称为endianness的东西(这是每个样本数据包中byte
的顺序)之外。这是一张图。此示例(打包到
byte
数组中)保留十进制值9999:24-bit sample as big-endian: bytes[i] bytes[i + 1] bytes[i + 2] ┌──────┐ ┌──────┐ ┌──────┐ 00000000 00100111 00001111 24-bit sample as little-endian: bytes[i] bytes[i + 1] bytes[i + 2] ┌──────┐ ┌──────┐ ┌──────┐ 00001111 00100111 00000000
They hold the same binary values; however, the byte
orders are reversed.
- In big-endian, the more significant
byte
s come before the less significantbyte
s. - In little-endian, the less significant
byte
s come before the more significantbytes
.
WAV files are stored in little-endian order and AIFF files are stored in big-endian order. Endianness can be obtained from AudioFormat.isBigEndian
.
To concatenate the byte
s and put them in to our long temp
variable, we:
- Bitwise AND each
byte
with the mask0xFF
(which is0b1111_1111
) to avoid sign-extension when thebyte
is automatically promoted. (char
,byte
andshort
are promoted toint
when arithmetic is performed on them.) See also What doesvalue & 0xff
do in Java? - Bit shift each
byte
in to position. - Bitwise OR the
byte
s together.
Here's a 24-bit example:
long temp;
if (isBigEndian) {
temp = (
((bytes[i ] & 0xffL) << 16)
| ((bytes[i + 1] & 0xffL) << 8)
| (bytes[i + 2] & 0xffL)
);
} else {
temp = (
(bytes[i ] & 0xffL)
| ((bytes[i + 1] & 0xffL) << 8)
| ((bytes[i + 2] & 0xffL) << 16)
);
}
请注意,移位顺序是根据字节顺序反转的。
这也可以概括为一个循环,可以在此答案底部的完整代码中看到。 (请参阅
unpackAnyBit
和packAnyBit
方法。)现在我们将
byte
串联在一起,我们可以采取更多步骤将它们转换为样本。下一步取决于实际编码。如何解码
Encoding.PCM_SIGNED
?两者的补码必须扩展。这意味着,如果最高有效位(MSB)设置为1,则将其上方的所有位填充为1。如果设置了符号位,则算术右移(
>>
)将自动为我们填充,因此我通常采用这种方式:int bitsToExtend = Long.SIZE - bitsPerSample;
float sample = (temp << bitsToExtend) >> bitsToExtend.
(其中
Long.SIZE
是64。如果我们的temp
变量不是long
,我们将使用其他名称。如果使用例如int temp
,则将使用32。)为了理解它是如何工作的,下面是将8位符号扩展到16位的示意图:
11111111 is the byte value -1, but the upper bits of the short are 0. Shift the byte's MSB in to the MSB position of the short. 0000 0000 1111 1111 << 8 ─────────────────── 1111 1111 0000 0000 Shift it back and the right-shift fills all the upper bits with 1s. We now have the short value of -1. 1111 1111 0000 0000 >> 8 ─────────────────── 1111 1111 1111 1111
Positive values (that had a 0 in the MSB) are left unchanged. This is a nice property of the arithmetic right-shift.
Then normalize the sample, as described in Some Assumptions.
You might not need to write explicit sign-extension if your code is simple
Java does sign-extension automatically when converting from one integral type to a larger type, for example byte
to int
. If you know that your input and output format are always signed, you can use the automatic sign-extension while concatenating bytes in the earlier step.
Recall from the section above (How do I coerce the byte array in to meaningful data?) that we used b & 0xFF
to prevent sign-extension from occurring. If you just remove the & 0xFF
from the highest byte
, sign-extension will happen automatically.
For example, the following decodes signed, big-endian, 16-bit samples:
for (int i = 0; i < bytes.length; i++) {
int sample = (bytes[i] << 8) // high byte is sign-extended
| (bytes[i + 1] & 0xFF); // low byte is not
// ...
}
如何解码
Encoding.PCM_UNSIGNED
?我们把它变成一个签名的号码。无符号样本只是偏移,因此,例如:
因此,事实证明这很简单。只需减去偏移量:
float sample = temp - fullScale(bitsPerSample);
然后按照一些假设中的描述对样本进行归一化。
如何解码
Encoding.PCM_FLOAT
?这是Java 7以来的新功能。
实际上,浮点PCM通常是IEEE 32位或IEEE 64位,并且已经标准化为
±1.0
的范围。可以使用实用程序方法 Float#intBitsToFloat
和 Double#longBitsToDouble
获得样本。// IEEE 32-bit
float sample = Float.intBitsToFloat((int) temp);
// IEEE 64-bit
double sampleAsDouble = Double.longBitsToDouble(temp);
float sample = (float) sampleAsDouble; // or just use double for arithmetic
如何解码
Encoding.ULAW
和Encoding.ALAW
?这些是companding压缩编解码器,在电话等中更为常见。我假设
javax.sound.sampled
支持它们,因为Sun's Au format使用了它们。 (但是,它不仅限于这种类型的容器。例如,WAV可以包含这些编码。)您可以将A-law和μ-law概念化为浮点格式。这些是PCM格式,但是值的范围是非线性的。
有两种解码方法。我将展示使用数学公式的方式。您还可以通过直接操作described in this blog post二进制文件来对其进行解码,但看起来更加神秘。
两者的压缩数据均为8位。通常,解码时A律为13位,而解码时μ律为14位。但是,应用公式将产生
±1.0
范围。在应用公式之前,需要做三件事:
±1.0
的范围,因此必须缩放8位值。 对于μ-law,所有位都被反转,因此:
temp ^= 0xffL; // 0xff == 0b1111_1111
(请注意,我们不能使用
~
,因为我们不想反转long
的高位。)对于A律,其他每一位都被反转,因此:
temp ^= 0x55L; // 0x55 == 0b0101_0101
(可以使用XOR进行求反。请参见How do you set, clear and toggle a bit?)
为了将符号和大小转换为二进制补码,我们:
// 0x80 == 0b1000_0000
if ((temp & 0x80L) != 0) {
temp ^= 0x80L;
temp = -temp;
}
然后按一些假设中所述的相同方式缩放编码的数字:
sample = temp / fullScale(8);
现在我们可以应用扩展。
转换为Java的μ律公式为:
sample = (float) (
signum(sample)
*
(1.0 / 255.0)
*
(pow(256.0, abs(sample)) - 1.0)
);
转换为Java的A律公式为:
float signum = signum(sample);
sample = abs(sample);
if (sample < (1.0 / (1.0 + log(87.7)))) {
sample = (float) (
sample * ((1.0 + log(87.7)) / 87.7)
);
} else {
sample = (float) (
exp((sample * (1.0 + log(87.7))) - 1.0) / 87.7
);
}
sample = signum * sample;
这是
SimpleAudioConversion
类的完整示例代码。package mcve.audio;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioFormat.Encoding;
import static java.lang.Math.*;
/**
* <p>Performs simple audio format conversion.</p>
*
* <p>Example usage:</p>
*
* <pre>{@code AudioInputStream ais = ... ;
* SourceDataLine line = ... ;
* AudioFormat fmt = ... ;
*
* // do setup
*
* for (int blen = 0; (blen = ais.read(bytes)) > -1;) {
* int slen;
* slen = SimpleAudioConversion.decode(bytes, samples, blen, fmt);
*
* // do something with samples
*
* blen = SimpleAudioConversion.encode(samples, bytes, slen, fmt);
* line.write(bytes, 0, blen);
* }}</pre>
*
* @author Radiodef
* @see <a href="http://stackoverflow.com/a/26824664/2891664">Overview on Stack Overflow</a>
*/
public final class SimpleAudioConversion {
private SimpleAudioConversion() {}
/**
* Converts from a byte array to an audio sample float array.
*
* @param bytes the byte array, filled by the AudioInputStream
* @param samples an array to fill up with audio samples
* @param blen the return value of AudioInputStream.read
* @param fmt the source AudioFormat
*
* @return the number of valid audio samples converted
*
* @throws NullPointerException if bytes, samples or fmt is null
* @throws ArrayIndexOutOfBoundsException
* if bytes.length is less than blen or
* if samples.length is less than blen / bytesPerSample(fmt.getSampleSizeInBits())
*/
public static int decode(byte[] bytes,
float[] samples,
int blen,
AudioFormat fmt) {
int bitsPerSample = fmt.getSampleSizeInBits();
int bytesPerSample = bytesPerSample(bitsPerSample);
boolean isBigEndian = fmt.isBigEndian();
Encoding encoding = fmt.getEncoding();
double fullScale = fullScale(bitsPerSample);
int i = 0;
int s = 0;
while (i < blen) {
long temp = unpackBits(bytes, i, isBigEndian, bytesPerSample);
float sample = 0f;
if (encoding == Encoding.PCM_SIGNED) {
temp = extendSign(temp, bitsPerSample);
sample = (float) (temp / fullScale);
} else if (encoding == Encoding.PCM_UNSIGNED) {
temp = unsignedToSigned(temp, bitsPerSample);
sample = (float) (temp / fullScale);
} else if (encoding == Encoding.PCM_FLOAT) {
if (bitsPerSample == 32) {
sample = Float.intBitsToFloat((int) temp);
} else if (bitsPerSample == 64) {
sample = (float) Double.longBitsToDouble(temp);
}
} else if (encoding == Encoding.ULAW) {
sample = bitsToMuLaw(temp);
} else if (encoding == Encoding.ALAW) {
sample = bitsToALaw(temp);
}
samples[s] = sample;
i += bytesPerSample;
s++;
}
return s;
}
/**
* Converts from an audio sample float array to a byte array.
*
* @param samples an array of audio samples to encode
* @param bytes an array to fill up with bytes
* @param slen the return value of the decode method
* @param fmt the destination AudioFormat
*
* @return the number of valid bytes converted
*
* @throws NullPointerException if samples, bytes or fmt is null
* @throws ArrayIndexOutOfBoundsException
* if samples.length is less than slen or
* if bytes.length is less than slen * bytesPerSample(fmt.getSampleSizeInBits())
*/
public static int encode(float[] samples,
byte[] bytes,
int slen,
AudioFormat fmt) {
int bitsPerSample = fmt.getSampleSizeInBits();
int bytesPerSample = bytesPerSample(bitsPerSample);
boolean isBigEndian = fmt.isBigEndian();
Encoding encoding = fmt.getEncoding();
double fullScale = fullScale(bitsPerSample);
int i = 0;
int s = 0;
while (s < slen) {
float sample = samples[s];
long temp = 0L;
if (encoding == Encoding.PCM_SIGNED) {
temp = (long) (sample * fullScale);
} else if (encoding == Encoding.PCM_UNSIGNED) {
temp = (long) (sample * fullScale);
temp = signedToUnsigned(temp, bitsPerSample);
} else if (encoding == Encoding.PCM_FLOAT) {
if (bitsPerSample == 32) {
temp = Float.floatToRawIntBits(sample);
} else if (bitsPerSample == 64) {
temp = Double.doubleToRawLongBits(sample);
}
} else if (encoding == Encoding.ULAW) {
temp = muLawToBits(sample);
} else if (encoding == Encoding.ALAW) {
temp = aLawToBits(sample);
}
packBits(bytes, i, temp, isBigEndian, bytesPerSample);
i += bytesPerSample;
s++;
}
return i;
}
/**
* Computes the block-aligned bytes per sample of the audio format,
* using Math.ceil(bitsPerSample / 8.0).
* <p>
* Round towards the ceiling because formats that allow bit depths
* in non-integral multiples of 8 typically pad up to the nearest
* integral multiple of 8. So for example, a 31-bit AIFF file will
* actually store 32-bit blocks.
*
* @param bitsPerSample the return value of AudioFormat.getSampleSizeInBits
* @return The block-aligned bytes per sample of the audio format.
*/
public static int bytesPerSample(int bitsPerSample) {
return (int) ceil(bitsPerSample / 8.0); // optimization: ((bitsPerSample + 7) >>> 3)
}
/**
* Computes the largest magnitude representable by the audio format,
* using Math.pow(2.0, bitsPerSample - 1). Note that for two's complement
* audio, the largest positive value is one less than the return value of
* this method.
* <p>
* The result is returned as a double because in the case that
* bitsPerSample is 64, a long would overflow.
*
* @param bitsPerSample the return value of AudioFormat.getBitsPerSample
* @return the largest magnitude representable by the audio format
*/
public static double fullScale(int bitsPerSample) {
return pow(2.0, bitsPerSample - 1); // optimization: (1L << (bitsPerSample - 1))
}
private static long unpackBits(byte[] bytes,
int i,
boolean isBigEndian,
int bytesPerSample) {
switch (bytesPerSample) {
case 1: return unpack8Bit(bytes, i);
case 2: return unpack16Bit(bytes, i, isBigEndian);
case 3: return unpack24Bit(bytes, i, isBigEndian);
default: return unpackAnyBit(bytes, i, isBigEndian, bytesPerSample);
}
}
private static long unpack8Bit(byte[] bytes, int i) {
return bytes[i] & 0xffL;
}
private static long unpack16Bit(byte[] bytes,
int i,
boolean isBigEndian) {
if (isBigEndian) {
return (
((bytes[i ] & 0xffL) << 8)
| (bytes[i + 1] & 0xffL)
);
} else {
return (
(bytes[i ] & 0xffL)
| ((bytes[i + 1] & 0xffL) << 8)
);
}
}
private static long unpack24Bit(byte[] bytes,
int i,
boolean isBigEndian) {
if (isBigEndian) {
return (
((bytes[i ] & 0xffL) << 16)
| ((bytes[i + 1] & 0xffL) << 8)
| (bytes[i + 2] & 0xffL)
);
} else {
return (
(bytes[i ] & 0xffL)
| ((bytes[i + 1] & 0xffL) << 8)
| ((bytes[i + 2] & 0xffL) << 16)
);
}
}
private static long unpackAnyBit(byte[] bytes,
int i,
boolean isBigEndian,
int bytesPerSample) {
long temp = 0;
if (isBigEndian) {
for (int b = 0; b < bytesPerSample; b++) {
temp |= (bytes[i + b] & 0xffL) << (
8 * (bytesPerSample - b - 1)
);
}
} else {
for (int b = 0; b < bytesPerSample; b++) {
temp |= (bytes[i + b] & 0xffL) << (8 * b);
}
}
return temp;
}
private static void packBits(byte[] bytes,
int i,
long temp,
boolean isBigEndian,
int bytesPerSample) {
switch (bytesPerSample) {
case 1: pack8Bit(bytes, i, temp);
break;
case 2: pack16Bit(bytes, i, temp, isBigEndian);
break;
case 3: pack24Bit(bytes, i, temp, isBigEndian);
break;
default: packAnyBit(bytes, i, temp, isBigEndian, bytesPerSample);
break;
}
}
private static void pack8Bit(byte[] bytes, int i, long temp) {
bytes[i] = (byte) (temp & 0xffL);
}
private static void pack16Bit(byte[] bytes,
int i,
long temp,
boolean isBigEndian) {
if (isBigEndian) {
bytes[i ] = (byte) ((temp >>> 8) & 0xffL);
bytes[i + 1] = (byte) ( temp & 0xffL);
} else {
bytes[i ] = (byte) ( temp & 0xffL);
bytes[i + 1] = (byte) ((temp >>> 8) & 0xffL);
}
}
private static void pack24Bit(byte[] bytes,
int i,
long temp,
boolean isBigEndian) {
if (isBigEndian) {
bytes[i ] = (byte) ((temp >>> 16) & 0xffL);
bytes[i + 1] = (byte) ((temp >>> 8) & 0xffL);
bytes[i + 2] = (byte) ( temp & 0xffL);
} else {
bytes[i ] = (byte) ( temp & 0xffL);
bytes[i + 1] = (byte) ((temp >>> 8) & 0xffL);
bytes[i + 2] = (byte) ((temp >>> 16) & 0xffL);
}
}
private static void packAnyBit(byte[] bytes,
int i,
long temp,
boolean isBigEndian,
int bytesPerSample) {
if (isBigEndian) {
for (int b = 0; b < bytesPerSample; b++) {
bytes[i + b] = (byte) (
(temp >>> (8 * (bytesPerSample - b - 1))) & 0xffL
);
}
} else {
for (int b = 0; b < bytesPerSample; b++) {
bytes[i + b] = (byte) ((temp >>> (8 * b)) & 0xffL);
}
}
}
private static long extendSign(long temp, int bitsPerSample) {
int bitsToExtend = Long.SIZE - bitsPerSample;
return (temp << bitsToExtend) >> bitsToExtend;
}
private static long unsignedToSigned(long temp, int bitsPerSample) {
return temp - (long) fullScale(bitsPerSample);
}
private static long signedToUnsigned(long temp, int bitsPerSample) {
return temp + (long) fullScale(bitsPerSample);
}
// mu-law constant
private static final double MU = 255.0;
// A-law constant
private static final double A = 87.7;
// natural logarithm of A
private static final double LN_A = log(A);
private static float bitsToMuLaw(long temp) {
temp ^= 0xffL;
if ((temp & 0x80L) != 0) {
temp = -(temp ^ 0x80L);
}
float sample = (float) (temp / fullScale(8));
return (float) (
signum(sample)
*
(1.0 / MU)
*
(pow(1.0 + MU, abs(sample)) - 1.0)
);
}
private static long muLawToBits(float sample) {
double sign = signum(sample);
sample = abs(sample);
sample = (float) (
sign * (log(1.0 + (MU * sample)) / log(1.0 + MU))
);
long temp = (long) (sample * fullScale(8));
if (temp < 0) {
temp = -temp ^ 0x80L;
}
return temp ^ 0xffL;
}
private static float bitsToALaw(long temp) {
temp ^= 0x55L;
if ((temp & 0x80L) != 0) {
temp = -(temp ^ 0x80L);
}
float sample = (float) (temp / fullScale(8));
float sign = signum(sample);
sample = abs(sample);
if (sample < (1.0 / (1.0 + LN_A))) {
sample = (float) (sample * ((1.0 + LN_A) / A));
} else {
sample = (float) (exp((sample * (1.0 + LN_A)) - 1.0) / A);
}
return sign * sample;
}
private static long aLawToBits(float sample) {
double sign = signum(sample);
sample = abs(sample);
if (sample < (1.0 / A)) {
sample = (float) ((A * sample) / (1.0 + LN_A));
} else {
sample = (float) ((1.0 + log(A * sample)) / (1.0 + LN_A));
}
sample *= sign;
long temp = (long) (sample * fullScale(8));
if (temp < 0) {
temp = -temp ^ 0x80L;
}
return temp ^ 0x55L;
}
}
关于java - 如何使用Java Sound中的音频样本数据?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26824663/