java - 如何使用Java Sound中的音频样本数据？

通常将这个问题作为另一个问题的一部分，但事实证明答案很长。我决定在这里回答，以便可以在其他地方链接到它。

尽管目前我还不知道Java可以为我们生成音频样本的方式，但是如果将来发生变化，这可能是一个地方。我知道 JavaFX 有这样的东西，例如 AudioSpectrumListener ，但仍然不是直接访问样本的方法。

我正在使用javax.sound.sampled进行播放和/或录制，但我想对音频做些事情。

也许我想以可视方式显示它或以某种方式对其进行处理。

如何使用Java Sound访问音频样本数据？

也可以看看:

Java Sound Tutorials(官方)

Java Sound Resources(非官方)

最佳答案

好吧，最简单的答案是，目前Java无法为程序员提供示例数据。

This quote is from the official tutorial:

There are two ways to apply signal processing:

You can use any processing supported by the mixer or its component lines, by querying for Control objects and then setting the controls as the user desires. Typical controls supported by mixers and lines include gain, pan, and reverberation controls.

If the kind of processing you need isn't provided by the mixer or its lines, your program can operate directly on the audio bytes, manipulating them as desired.

This page discusses the first technique in greater detail, because there is no special API for the second technique.

带有javax.sound.sampled的播放在很大程度上充当了文件和音频设备之间的桥梁。从文件中读取字节并将其发送出去。

不要假设字节是有意义的音频样本!除非您碰巧有一个8位AIFF文件，否则它们不是。 (另一方面，如果样本确实是8位带符号的，则可以对它们进行算术运算。如果只是在玩弄，使用8位是避免此处描述的复杂性的一种方法。)

因此，我将枚举 AudioFormat.Encoding 的类型并描述如何自己对其进行解码。该答案将不涉及如何编码，但包含在底部的完整代码示例中。编码通常只是相反的解码过程。

这是一个很长的答案，但我想给出一个全面的概述。

关于数字音频的一点

通常，当解释数字音频时，我们指的是Linear Pulse-Code Modulation(LPCM)。

以规则的间隔采样连续的声波，并将振幅量化为一定比例的整数。

这里显示的是一个采样并量化为4位的正弦波:

(请注意，two's complement表示形式中的最大正值比最大负值小1。这是需要注意的次要细节。例如，如果您剪辑音频而忘记了它，则肯定剪辑会溢出。)

当计算机上有音频时，我们将得到一系列这些样本。我们要将示例数组变成byte数组。

为了解码PCM样本，我们不太在乎采样率或 channel 数，因此在这里我就不再赘述。 channel 通常是交错的，因此，如果我们有它们的数组，它们将像这样存储:

Index 0: Sample 0 (Left Channel)
Index 1: Sample 0 (Right Channel)
Index 2: Sample 1 (Left Channel)
Index 3: Sample 1 (Right Channel)
Index 4: Sample 2 (Left Channel)
Index 5: Sample 2 (Right Channel)
...

换句话说，对于立体声，阵列中的样本仅在左右之间交替。

一些假设

所有代码示例都将采用以下声明:

byte[] bytes;从byte中读取的AudioInputStream数组。

float[] samples;我们将要填充的输出样本数组。

float sample;我们当前正在处理的示例。

long temp;用于常规操作的临时值。

int i; byte数组中当前样本数据开始的位置。

我们将把float[]数组中的所有样本标准化为-1f <= sample <= 1f的范围。我所见过的所有浮点音频都以这种方式发送，非常方便。

如果我们的源音频还不是这样(例如整数样本)，我们可以使用以下内容将其规范化:

sample = sample / fullScale(bitsPerSample);

其中fullScale为2bitsPerSample-1，即Math.pow(2, bitsPerSample-1)。

如何将byte数组强制转换为有意义的数据？
byte数组包含拆分后的示例帧，并且全部在一行中。实际上，这很简单，除了称为endianness的东西(这是每个样本数据包中byte的顺序)之外。

这是一张图。此示例(打包到byte数组中)保留十进制值9999:

  24-bit sample as big-endian:

 bytes[i]   bytes[i + 1] bytes[i + 2]
 ┌──────┐     ┌──────┐     ┌──────┐
 00000000     00100111     00001111

 24-bit sample as little-endian:

 bytes[i]   bytes[i + 1] bytes[i + 2]
 ┌──────┐     ┌──────┐     ┌──────┐
 00001111     00100111     00000000

They hold the same binary values; however, the byte orders are reversed.

In big-endian, the more significant bytes come before the less significant bytes.
In little-endian, the less significant bytes come before the more significant bytes.

WAV files are stored in little-endian order and AIFF files are stored in big-endian order. Endianness can be obtained from AudioFormat.isBigEndian.

To concatenate the bytes and put them in to our long temp variable, we:

Bitwise AND each byte with the mask 0xFF (which is 0b1111_1111) to avoid sign-extension when the byte is automatically promoted. (char, byte and short are promoted to int when arithmetic is performed on them.) See also What does value & 0xff do in Java?
Bit shift each byte in to position.
Bitwise OR the bytes together.

Here's a 24-bit example:

long temp;
if (isBigEndian) {
    temp = (
          ((bytes[i    ] & 0xffL) << 16)
        | ((bytes[i + 1] & 0xffL) <<  8)
        |  (bytes[i + 2] & 0xffL)
    );
} else {
    temp = (
           (bytes[i    ] & 0xffL)
        | ((bytes[i + 1] & 0xffL) <<  8)
        | ((bytes[i + 2] & 0xffL) << 16)
    );
}

请注意，移位顺序是根据字节顺序反转的。

这也可以概括为一个循环，可以在此答案底部的完整代码中看到。 (请参阅unpackAnyBit和packAnyBit方法。)

现在我们将byte串联在一起，我们可以采取更多步骤将它们转换为样本。下一步取决于实际编码。

如何解码Encoding.PCM_SIGNED？

两者的补码必须扩展。这意味着，如果最高有效位(MSB)设置为1，则将其上方的所有位填充为1。如果设置了符号位，则算术右移(>>)将自动为我们填充，因此我通常采用这种方式:

int bitsToExtend = Long.SIZE - bitsPerSample;
float sample = (temp << bitsToExtend) >> bitsToExtend.

(其中Long.SIZE是64。如果我们的temp变量不是long，我们将使用其他名称。如果使用例如int temp，则将使用32。)

为了理解它是如何工作的，下面是将8位符号扩展到16位的示意图:

 11111111 is the byte value -1, but the upper bits of the short are 0.
 Shift the byte's MSB in to the MSB position of the short.

 0000 0000 1111 1111
 <<                8
 ───────────────────
 1111 1111 0000 0000

 Shift it back and the right-shift fills all the upper bits with 1s.
 We now have the short value of -1.

 1111 1111 0000 0000
 >>                8
 ───────────────────
 1111 1111 1111 1111

Positive values (that had a 0 in the MSB) are left unchanged. This is a nice property of the arithmetic right-shift.

Then normalize the sample, as described in Some Assumptions.

You might not need to write explicit sign-extension if your code is simple

Java does sign-extension automatically when converting from one integral type to a larger type, for example byte to int. If you know that your input and output format are always signed, you can use the automatic sign-extension while concatenating bytes in the earlier step.

Recall from the section above (How do I coerce the byte array in to meaningful data?) that we used b & 0xFF to prevent sign-extension from occurring. If you just remove the & 0xFF from the highest byte, sign-extension will happen automatically.

For example, the following decodes signed, big-endian, 16-bit samples:

for (int i = 0; i < bytes.length; i++) {
    int sample = (bytes[i] << 8) // high byte is sign-extended
               | (bytes[i + 1] & 0xFF); // low byte is not
    // ...
}

如何解码Encoding.PCM_UNSIGNED？

我们把它变成一个签名的号码。无符号样本只是偏移，因此，例如:

无符号值0对应于最大负符号值。

无符号值2bitsPerSample-1对应于有符号值0。

2bitsPerSample的无符号值对应于最大正符号值。

因此，事实证明这很简单。只需减去偏移量:

float sample = temp - fullScale(bitsPerSample);

然后按照一些假设中的描述对样本进行归一化。

如何解码Encoding.PCM_FLOAT？

这是Java 7以来的新功能。

实际上，浮点PCM通常是IEEE 32位或IEEE 64位，并且已经标准化为±1.0的范围。可以使用实用程序方法 Float#intBitsToFloat 和 Double#longBitsToDouble 获得样本。

// IEEE 32-bit
float sample = Float.intBitsToFloat((int) temp);

// IEEE 64-bit
double sampleAsDouble = Double.longBitsToDouble(temp);
float sample = (float) sampleAsDouble; // or just use double for arithmetic

如何解码Encoding.ULAW和Encoding.ALAW？

这些是companding压缩编解码器，在电话等中更为常见。我假设javax.sound.sampled支持它们，因为Sun's Au format使用了它们。 (但是，它不仅限于这种类型的容器。例如，WAV可以包含这些编码。)

您可以将A-law和μ-law概念化为浮点格式。这些是PCM格式，但是值的范围是非线性的。

有两种解码方法。我将展示使用数学公式的方式。您还可以通过直接操作described in this blog post二进制文件来对其进行解码，但看起来更加神秘。

两者的压缩数据均为8位。通常，解码时A律为13位，而解码时μ律为14位。但是，应用公式将产生±1.0范围。

在应用公式之前，需要做三件事:

由于涉及数据完整性的原因，有些位通常会进行倒置存储。

它们存储为符号和大小(而不是二进制补码)。

该公式还期望±1.0的范围，因此必须缩放8位值。

对于μ-law，所有位都被反转，因此:

temp ^= 0xffL; // 0xff == 0b1111_1111

(请注意，我们不能使用~，因为我们不想反转long的高位。)

对于A律，其他每一位都被反转，因此:

temp ^= 0x55L; // 0x55 == 0b0101_0101

(可以使用XOR进行求反。请参见How do you set, clear and toggle a bit?)

为了将符号和大小转换为二进制补码，我们:

检查符号位是否已设置。

如果是这样，请清除符号位并取反数字。

// 0x80 == 0b1000_0000
if ((temp & 0x80L) != 0) {
    temp ^= 0x80L;
    temp = -temp;
}

然后按一些假设中所述的相同方式缩放编码的数字:

sample = temp / fullScale(8);

现在我们可以应用扩展。

转换为Java的μ律公式为:

sample = (float) (
    signum(sample)
        *
    (1.0 / 255.0)
        *
    (pow(256.0, abs(sample)) - 1.0)
);

转换为Java的A律公式为:

float signum = signum(sample);
sample = abs(sample);

if (sample < (1.0 / (1.0 + log(87.7)))) {
    sample = (float) (
        sample * ((1.0 + log(87.7)) / 87.7)
    );
} else {
    sample = (float) (
        exp((sample * (1.0 + log(87.7))) - 1.0) / 87.7
    );
}

sample = signum * sample;

这是SimpleAudioConversion类的完整示例代码。

package mcve.audio;

import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioFormat.Encoding;

import static java.lang.Math.*;

/**
 * <p>Performs simple audio format conversion.</p>
 *
 * <p>Example usage:</p>
 *
 * <pre>{@code  AudioInputStream ais = ... ;
 * SourceDataLine  line = ... ;
 * AudioFormat      fmt = ... ;
 *
 * // do setup
 *
 * for (int blen = 0; (blen = ais.read(bytes)) > -1;) {
 *     int slen;
 *     slen = SimpleAudioConversion.decode(bytes, samples, blen, fmt);
 *
 *     // do something with samples
 *
 *     blen = SimpleAudioConversion.encode(samples, bytes, slen, fmt);
 *     line.write(bytes, 0, blen);
 * }}</pre>
 *
 * @author Radiodef
 * @see <a href="http://stackoverflow.com/a/26824664/2891664">Overview on Stack Overflow</a>
 */
public final class SimpleAudioConversion {
    private SimpleAudioConversion() {}

    /**
     * Converts from a byte array to an audio sample float array.
     *
     * @param bytes   the byte array, filled by the AudioInputStream
     * @param samples an array to fill up with audio samples
     * @param blen    the return value of AudioInputStream.read
     * @param fmt     the source AudioFormat
     *
     * @return the number of valid audio samples converted
     *
     * @throws NullPointerException if bytes, samples or fmt is null
     * @throws ArrayIndexOutOfBoundsException
     *         if bytes.length is less than blen or
     *         if samples.length is less than blen / bytesPerSample(fmt.getSampleSizeInBits())
     */
    public static int decode(byte[]      bytes,
                             float[]     samples,
                             int         blen,
                             AudioFormat fmt) {
        int   bitsPerSample = fmt.getSampleSizeInBits();
        int  bytesPerSample = bytesPerSample(bitsPerSample);
        boolean isBigEndian = fmt.isBigEndian();
        Encoding   encoding = fmt.getEncoding();
        double    fullScale = fullScale(bitsPerSample);

        int i = 0;
        int s = 0;
        while (i < blen) {
            long temp = unpackBits(bytes, i, isBigEndian, bytesPerSample);
            float sample = 0f;

            if (encoding == Encoding.PCM_SIGNED) {
                temp = extendSign(temp, bitsPerSample);
                sample = (float) (temp / fullScale);

            } else if (encoding == Encoding.PCM_UNSIGNED) {
                temp = unsignedToSigned(temp, bitsPerSample);
                sample = (float) (temp / fullScale);

            } else if (encoding == Encoding.PCM_FLOAT) {
                if (bitsPerSample == 32) {
                    sample = Float.intBitsToFloat((int) temp);
                } else if (bitsPerSample == 64) {
                    sample = (float) Double.longBitsToDouble(temp);
                }
            } else if (encoding == Encoding.ULAW) {
                sample = bitsToMuLaw(temp);

            } else if (encoding == Encoding.ALAW) {
                sample = bitsToALaw(temp);
            }

            samples[s] = sample;

            i += bytesPerSample;
            s++;
        }

        return s;
    }

    /**
     * Converts from an audio sample float array to a byte array.
     *
     * @param samples an array of audio samples to encode
     * @param bytes   an array to fill up with bytes
     * @param slen    the return value of the decode method
     * @param fmt     the destination AudioFormat
     *
     * @return the number of valid bytes converted
     *
     * @throws NullPointerException if samples, bytes or fmt is null
     * @throws ArrayIndexOutOfBoundsException
     *         if samples.length is less than slen or
     *         if bytes.length is less than slen * bytesPerSample(fmt.getSampleSizeInBits())
     */
    public static int encode(float[]     samples,
                             byte[]      bytes,
                             int         slen,
                             AudioFormat fmt) {
        int   bitsPerSample = fmt.getSampleSizeInBits();
        int  bytesPerSample = bytesPerSample(bitsPerSample);
        boolean isBigEndian = fmt.isBigEndian();
        Encoding   encoding = fmt.getEncoding();
        double    fullScale = fullScale(bitsPerSample);

        int i = 0;
        int s = 0;
        while (s < slen) {
            float sample = samples[s];
            long temp = 0L;

            if (encoding == Encoding.PCM_SIGNED) {
                temp = (long) (sample * fullScale);

            } else if (encoding == Encoding.PCM_UNSIGNED) {
                temp = (long) (sample * fullScale);
                temp = signedToUnsigned(temp, bitsPerSample);

            } else if (encoding == Encoding.PCM_FLOAT) {
                if (bitsPerSample == 32) {
                    temp = Float.floatToRawIntBits(sample);
                } else if (bitsPerSample == 64) {
                    temp = Double.doubleToRawLongBits(sample);
                }
            } else if (encoding == Encoding.ULAW) {
                temp = muLawToBits(sample);

            } else if (encoding == Encoding.ALAW) {
                temp = aLawToBits(sample);
            }

            packBits(bytes, i, temp, isBigEndian, bytesPerSample);

            i += bytesPerSample;
            s++;
        }

        return i;
    }

    /**
     * Computes the block-aligned bytes per sample of the audio format,
     * using Math.ceil(bitsPerSample / 8.0).
     * <p>
     * Round towards the ceiling because formats that allow bit depths
     * in non-integral multiples of 8 typically pad up to the nearest
     * integral multiple of 8. So for example, a 31-bit AIFF file will
     * actually store 32-bit blocks.
     *
     * @param  bitsPerSample the return value of AudioFormat.getSampleSizeInBits
     * @return The block-aligned bytes per sample of the audio format.
     */
    public static int bytesPerSample(int bitsPerSample) {
        return (int) ceil(bitsPerSample / 8.0); // optimization: ((bitsPerSample + 7) >>> 3)
    }

    /**
     * Computes the largest magnitude representable by the audio format,
     * using Math.pow(2.0, bitsPerSample - 1). Note that for two's complement
     * audio, the largest positive value is one less than the return value of
     * this method.
     * <p>
     * The result is returned as a double because in the case that
     * bitsPerSample is 64, a long would overflow.
     *
     * @param bitsPerSample the return value of AudioFormat.getBitsPerSample
     * @return the largest magnitude representable by the audio format
     */
    public static double fullScale(int bitsPerSample) {
        return pow(2.0, bitsPerSample - 1); // optimization: (1L << (bitsPerSample - 1))
    }

    private static long unpackBits(byte[]  bytes,
                                   int     i,
                                   boolean isBigEndian,
                                   int     bytesPerSample) {
        switch (bytesPerSample) {
            case  1: return unpack8Bit(bytes, i);
            case  2: return unpack16Bit(bytes, i, isBigEndian);
            case  3: return unpack24Bit(bytes, i, isBigEndian);
            default: return unpackAnyBit(bytes, i, isBigEndian, bytesPerSample);
        }
    }

    private static long unpack8Bit(byte[] bytes, int i) {
        return bytes[i] & 0xffL;
    }

    private static long unpack16Bit(byte[]  bytes,
                                    int     i,
                                    boolean isBigEndian) {
        if (isBigEndian) {
            return (
                  ((bytes[i    ] & 0xffL) << 8)
                |  (bytes[i + 1] & 0xffL)
            );
        } else {
            return (
                   (bytes[i    ] & 0xffL)
                | ((bytes[i + 1] & 0xffL) << 8)
            );
        }
    }

    private static long unpack24Bit(byte[]  bytes,
                                    int     i,
                                    boolean isBigEndian) {
        if (isBigEndian) {
            return (
                  ((bytes[i    ] & 0xffL) << 16)
                | ((bytes[i + 1] & 0xffL) <<  8)
                |  (bytes[i + 2] & 0xffL)
            );
        } else {
            return (
                   (bytes[i    ] & 0xffL)
                | ((bytes[i + 1] & 0xffL) <<  8)
                | ((bytes[i + 2] & 0xffL) << 16)
            );
        }
    }

    private static long unpackAnyBit(byte[]  bytes,
                                     int     i,
                                     boolean isBigEndian,
                                     int     bytesPerSample) {
        long temp = 0;

        if (isBigEndian) {
            for (int b = 0; b < bytesPerSample; b++) {
                temp |= (bytes[i + b] & 0xffL) << (
                    8 * (bytesPerSample - b - 1)
                );
            }
        } else {
            for (int b = 0; b < bytesPerSample; b++) {
                temp |= (bytes[i + b] & 0xffL) << (8 * b);
            }
        }

        return temp;
    }

    private static void packBits(byte[]  bytes,
                                 int     i,
                                 long    temp,
                                 boolean isBigEndian,
                                 int     bytesPerSample) {
        switch (bytesPerSample) {
            case  1: pack8Bit(bytes, i, temp);
                     break;
            case  2: pack16Bit(bytes, i, temp, isBigEndian);
                     break;
            case  3: pack24Bit(bytes, i, temp, isBigEndian);
                     break;
            default: packAnyBit(bytes, i, temp, isBigEndian, bytesPerSample);
                     break;
        }
    }

    private static void pack8Bit(byte[] bytes, int i, long temp) {
        bytes[i] = (byte) (temp & 0xffL);
    }

    private static void pack16Bit(byte[]  bytes,
                                  int     i,
                                  long    temp,
                                  boolean isBigEndian) {
        if (isBigEndian) {
            bytes[i    ] = (byte) ((temp >>> 8) & 0xffL);
            bytes[i + 1] = (byte) ( temp        & 0xffL);
        } else {
            bytes[i    ] = (byte) ( temp        & 0xffL);
            bytes[i + 1] = (byte) ((temp >>> 8) & 0xffL);
        }
    }

    private static void pack24Bit(byte[]  bytes,
                                  int     i,
                                  long    temp,
                                  boolean isBigEndian) {
        if (isBigEndian) {
            bytes[i    ] = (byte) ((temp >>> 16) & 0xffL);
            bytes[i + 1] = (byte) ((temp >>>  8) & 0xffL);
            bytes[i + 2] = (byte) ( temp         & 0xffL);
        } else {
            bytes[i    ] = (byte) ( temp         & 0xffL);
            bytes[i + 1] = (byte) ((temp >>>  8) & 0xffL);
            bytes[i + 2] = (byte) ((temp >>> 16) & 0xffL);
        }
    }

    private static void packAnyBit(byte[]  bytes,
                                   int     i,
                                   long    temp,
                                   boolean isBigEndian,
                                   int     bytesPerSample) {
        if (isBigEndian) {
            for (int b = 0; b < bytesPerSample; b++) {
                bytes[i + b] = (byte) (
                    (temp >>> (8 * (bytesPerSample - b - 1))) & 0xffL
                );
            }
        } else {
            for (int b = 0; b < bytesPerSample; b++) {
                bytes[i + b] = (byte) ((temp >>> (8 * b)) & 0xffL);
            }
        }
    }

    private static long extendSign(long temp, int bitsPerSample) {
        int bitsToExtend = Long.SIZE - bitsPerSample;
        return (temp << bitsToExtend) >> bitsToExtend;
    }

    private static long unsignedToSigned(long temp, int bitsPerSample) {
        return temp - (long) fullScale(bitsPerSample);
    }

    private static long signedToUnsigned(long temp, int bitsPerSample) {
        return temp + (long) fullScale(bitsPerSample);
    }

    // mu-law constant
    private static final double MU = 255.0;
    // A-law constant
    private static final double A = 87.7;
    // natural logarithm of A
    private static final double LN_A = log(A);

    private static float bitsToMuLaw(long temp) {
        temp ^= 0xffL;
        if ((temp & 0x80L) != 0) {
            temp = -(temp ^ 0x80L);
        }

        float sample = (float) (temp / fullScale(8));

        return (float) (
            signum(sample)
                *
            (1.0 / MU)
                *
            (pow(1.0 + MU, abs(sample)) - 1.0)
        );
    }

    private static long muLawToBits(float sample) {
        double sign = signum(sample);
        sample = abs(sample);

        sample = (float) (
            sign * (log(1.0 + (MU * sample)) / log(1.0 + MU))
        );

        long temp = (long) (sample * fullScale(8));

        if (temp < 0) {
            temp = -temp ^ 0x80L;
        }

        return temp ^ 0xffL;
    }

    private static float bitsToALaw(long temp) {
        temp ^= 0x55L;
        if ((temp & 0x80L) != 0) {
            temp = -(temp ^ 0x80L);
        }

        float sample = (float) (temp / fullScale(8));

        float sign = signum(sample);
        sample = abs(sample);

        if (sample < (1.0 / (1.0 + LN_A))) {
            sample = (float) (sample * ((1.0 + LN_A) / A));
        } else {
            sample = (float) (exp((sample * (1.0 + LN_A)) - 1.0) / A);
        }

        return sign * sample;
    }

    private static long aLawToBits(float sample) {
        double sign = signum(sample);
        sample = abs(sample);

        if (sample < (1.0 / A)) {
            sample = (float) ((A * sample) / (1.0 + LN_A));
        } else {
            sample = (float) ((1.0 + log(A * sample)) / (1.0 + LN_A));
        }

        sample *= sign;

        long temp = (long) (sample * fullScale(8));

        if (temp < 0) {
            temp = -temp ^ 0x80L;
        }

        return temp ^ 0x55L;
    }
}

关于java - 如何使用Java Sound中的音频样本数据？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/26824663/

java - 如何使用Java Sound中的音频样本数据？

You might not need to write explicit sign-extension if your code is simple

上一篇：url - URL 中的奇怪字符

下一篇：java - 数据 Jpa 测试无法加载属性