matlab - 鸡尾酒会算法 SVD 实现......在一行代码中?

标签 matlab machine-learning octave linear-algebra svd

在 Coursera 上斯坦福大学的 Andrew Ng 关于机器学习的介绍性讲座中的一张幻灯片中,他给出了以下针对鸡尾酒会问题的 Octave 解决方案,假设音频源是由两个空间分离的麦克风录制的:

[W,s,v]=svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');

幻灯片的底部是“来源:Sam Roweis、Yair Weiss、Eero Simoncelli”,而较早一张幻灯片的底部是“由 Te-Won Lee 提供的音频剪辑”。在视频中,吴教授说,

"So you might look at unsupervised learning like this and ask, 'How complicated is it to implement this?' It seems like in order to build this application, it seems like to do this audio processing, you would write a ton of code, or maybe link into a bunch of C++ or Java libraries that process audio. It seems like it would be a really complicated program to do this audio: separating out audio and so on. It turns out the algorithm to do what you just heard, that can be done with just one line of code ... shown right here. It did take researchers a long time to come up with this line of code. So I'm not saying this is an easy problem. But it turns out that when you use the right programming environment many learning algorithms will be really short programs."

视频讲座中播放的分离音频效果并不完美,但在我看来,已经很棒了。有没有人对这一行代码如何表现如此出色有任何见解?特别是,有谁知道解释 Te-Won Lee、Sam Roweis、Yair Weiss 和 Eero Simoncelli 就这一行代码所做的工作的引用资料?

更新

为了证明算法对麦克风分离距离的敏感性,以下模拟(在 Octave 中)将来自两个空间分离的音调发生器的音调分开。

% define model 
f1 = 1100;              % frequency of tone generator 1; unit: Hz 
f2 = 2900;              % frequency of tone generator 2; unit: Hz 
Ts = 1/(40*max(f1,f2)); % sampling period; unit: s 
dMic = 1;               % distance between microphones centered about origin; unit: m 
dSrc = 10;              % distance between tone generators centered about origin; unit: m 
c = 340.29;             % speed of sound; unit: m / s 

% generate tones
figure(1);
t = [0:Ts:0.025];
tone1 = sin(2*pi*f1*t);
tone2 = sin(2*pi*f2*t);
plot(t,tone1); 
hold on;
plot(t,tone2,'r'); xlabel('time'); ylabel('amplitude'); axis([0 0.005 -1 1]); legend('tone 1', 'tone 2');
hold off;

% mix tones at microphones
% assume inverse square attenuation of sound intensity (i.e., inverse linear attenuation of sound amplitude)
figure(2);
dNear = (dSrc - dMic)/2;
dFar = (dSrc + dMic)/2;
mic1 = 1/dNear*sin(2*pi*f1*(t-dNear/c)) + \
       1/dFar*sin(2*pi*f2*(t-dFar/c));
mic2 = 1/dNear*sin(2*pi*f2*(t-dNear/c)) + \
       1/dFar*sin(2*pi*f1*(t-dFar/c));
plot(t,mic1);
hold on;
plot(t,mic2,'r'); xlabel('time'); ylabel('amplitude'); axis([0 0.005 -1 1]); legend('mic 1', 'mic 2');
hold off;

% use svd to isolate sound sources
figure(3);
x = [mic1' mic2'];
[W,s,v]=svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');
plot(t,v(:,1));
hold on;
maxAmp = max(v(:,1));
plot(t,v(:,2),'r'); xlabel('time'); ylabel('amplitude'); axis([0 0.005 -maxAmp maxAmp]); legend('isolated tone 1', 'isolated tone 2');
hold off;

在我的笔记本电脑上执行大约 10 分钟后,模拟生成了以下三个图形,说明两个孤立的音调具有正确的频率。

Figure 1 Figure 2 Figure 3

但是,将麦克风分离距离设置为零(即 dMic = 0)会导致模拟生成以下三个数字,说明模拟无法隔离第二个音调(由 svd 的 s 中返回的单个有效对角线项确认)矩阵)。

Figure 1 with no mic separation Figure 2 with no mic separation Figure 3 with no mic separation

我希望智能手机上的麦克风分离距离足够大以产生良好的结果,但将麦克风分离距离设置为 5.25 英寸(即 dMic = 0.1333 米)会导致模拟生成以下结果,这并不令人鼓舞,说明第一个孤立音调中较高频率分量的图。

Figure 1 on smartphone Figure 2 on smartphone Figure 3 on smartphone

最佳答案

2 年后,我也在尝试解决这个问题。但是我得到了答案;希望它能帮助别人。

您需要 2 个录音。您可以从 http://research.ics.aalto.fi/ica/cocktail/cocktail_en.cgi 获取音频示例.

实现引用是http://www.cs.nyu.edu/~roweis/kica.html

好的,这是代码 -

[x1, Fs1] = audioread('mix1.wav');
[x2, Fs2] = audioread('mix2.wav');
xx = [x1, x2]';
yy = sqrtm(inv(cov(xx')))*(xx-repmat(mean(xx,2),1,size(xx,2)));
[W,s,v] = svd((repmat(sum(yy.*yy,1),size(yy,1),1).*yy)*yy');

a = W*xx; %W is unmixing matrix
subplot(2,2,1); plot(x1); title('mixed audio - mic 1');
subplot(2,2,2); plot(x2); title('mixed audio - mic 2');
subplot(2,2,3); plot(a(1,:), 'g'); title('unmixed wave 1');
subplot(2,2,4); plot(a(2,:),'r'); title('unmixed wave 2');

audiowrite('unmixed1.wav', a(1,:), Fs1);
audiowrite('unmixed2.wav', a(2,:), Fs1);

enter image description here

关于matlab - 鸡尾酒会算法 SVD 实现......在一行代码中?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20414667/

相关文章:

matlab - 对于大型阵列,GPU在Matlab中的gpuArray矩阵上崩溃

matlab - 学习在 MATLAB 中构建能带矩阵

python - 如何标准化和标准化字符串数据

python - 为什么我的预测值几乎相同(并且与平均值相似)?

c++ - 使用 SWIG 包装 Octave 的 C++ API

javascript - 将用 Octave 编写的 K-means 转换为 Tensorflow.js

java - 使用 MATLAB 编译器将 MATLAB 转换为 Java

matlab - 图像处理中频域高斯LPF和理想LPF的区别

python - 重复使用 GradientTape 进行多个雅可比计算

matlab - 如何调整 x 和 y 轴的大小