我正在尝试进行一些音频处理,但我真的受困于立体声到单声道的转换。我在互联网上查找有关立体声到单声道转换的信息。
据我所知,我可以取左声道、右声道,将它们相加并除以 2。但是当我再次将结果转储到 WAV 文件中时,我得到了很多前景噪音。我知道处理数据时可能会引起噪音,字节变量中有一些溢出。
这是我从 MP3 文件中检索 byte[] 数据 block 的类(class):
公共(public)类 InputSoundDecoder {
private int BUFFER_SIZE = 128000;
private String _inputFileName;
private File _soundFile;
private AudioInputStream _audioInputStream;
private AudioFormat _audioInputFormat;
private AudioFormat _decodedFormat;
private AudioInputStream _audioInputDecodedStream;
public InputSoundDecoder(String fileName) throws UnsuportedSampleRateException{
this._inputFileName = fileName;
this._soundFile = new File(this._inputFileName);
try{
this._audioInputStream = AudioSystem.getAudioInputStream(this._soundFile);
}
catch (Exception e){
e.printStackTrace();
System.err.println("Could not open file: " + this._inputFileName);
System.exit(1);
}
this._audioInputFormat = this._audioInputStream.getFormat();
this._decodedFormat = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 44100, 16, 2, 1, 44100, false);
this._audioInputDecodedStream = AudioSystem.getAudioInputStream(this._decodedFormat, this._audioInputStream);
/** Supported sample rates */
switch((int)this._audioInputFormat.getSampleRate()){
case 22050:
this.BUFFER_SIZE = 2304;
break;
case 44100:
this.BUFFER_SIZE = 4608;
break;
default:
throw new UnsuportedSampleRateException((int)this._audioInputFormat.getSampleRate());
}
System.out.println ("# Channels: " + this._decodedFormat.getChannels());
System.out.println ("Sample size (bits): " + this._decodedFormat.getSampleSizeInBits());
System.out.println ("Frame size: " + this._decodedFormat.getFrameSize());
System.out.println ("Frame rate: " + this._decodedFormat.getFrameRate());
}
public byte[] getSamples(){
byte[] abData = new byte[this.BUFFER_SIZE];
int bytesRead = 0;
try{
bytesRead = this._audioInputDecodedStream.read(abData,0,abData.length);
}
catch (Exception e){
e.printStackTrace();
System.err.println("Error getting samples from file: " + this._inputFileName);
System.exit(1);
}
if (bytesRead > 0)
return abData;
else
return null;
}
这意味着,每次我调用 getSamples 时,它都会返回一个数组,如下所示:
buff = {Lchannel, Rchannel, Lchannel, Rchannel,Lchannel, Rchannel,Lchannel, Rchannel...}
转换为单声道的处理例程如下所示:
byte[] buff = null;
while( (buff = _input.getSamples()) != null ){
/** Convert to mono */
byte[] mono = new byte[buff.length/2];
for (int i = 0 ; i < mono.length/2; ++i){
int left = (buff[i * 4] << 8) | (buff[i * 4 + 1] & 0xff);
int right = (buff[i * 4 + 2] <<8) | (buff[i * 4 + 3] & 0xff);
int avg = (left + right) / 2;
short m = (short)avg; /*Mono is an average between 2 channels (stereo)*/
mono[i * 2] = (byte)((short)(m >> 8));
mono[i * 2 + 1] = (byte)(m & 0xff);
}
并使用以下方式写入 wav 文件:
public static void writeWav(byte [] theResult, int samplerate, File outfile) {
// now convert theResult into a wav file
// probably should use a file if samplecount is too big!
int theSize = theResult.length;
InputStream is = new ByteArrayInputStream(theResult);
//Short2InputStream sis = new Short2InputStream(theResult);
AudioFormat audioF = new AudioFormat(
AudioFormat.Encoding.PCM_SIGNED,
samplerate,
16,
1, // channels
2, // framesize
samplerate,
false
);
AudioInputStream ais = new AudioInputStream(is, audioF, theSize);
try {
AudioSystem.write(ais, AudioFileFormat.Type.WAVE, outfile);
} catch (IOException ioe) {
System.err.println("IO Exception; probably just done with file");
return;
}
}
以 44100 作为采样率。
请记住,实际上我得到的 byte[] 数组已经是 pcm,因此 mp3 -> pcm 转换是通过指定完成的
this._decodedFormat = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 44100, 16, 2, 1, 44100, false); this._audioInputDecodedStream = AudioSystem.getAudioInputStream(this._decodedFormat, this._audioInputStream);
正如我所说,在写入 Wav 文件时,我遇到了很多噪音。我假装对每个字节 block 应用 FFT,但我认为由于噪音的原因,结果是不正确的。
因为我正在拍摄两首歌,其中一首是另一首的 20 秒裁剪,当将裁剪 fft 结果与原始 20 秒子集进行比较时,它根本不匹配。
我认为这是立体声->单声道转换不正确的原因。
希望有人知道这件事,
问候。
最佳答案
正如评论中所指出的,字节顺序可能是错误的。此外,转换为带符号的 short 并对其进行移位可能会导致第一个字节为 0xFF。
尝试:
int HI = 0; int LO = 1;
int left = (buff[i * 4 + HI] << 8) | (buff[i * 4 + LO] & 0xff);
int right = (buff[i * 4 + 2 + HI] << 8) | (buff[i * 4 + 2 + LO] & 0xff);
int avg = (left + right) / 2;
mono[i * 2 + HI] = (byte)((avg >> 8) & 0xff);
mono[i * 2 + LO] = (byte)(avg & 0xff);
然后切换HI和LO的值,看是否变好。
关于java - 将音频立体声转换为音频字节,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16466515/