python - 3种python库中的MFCC和delta系数

我最近在做关于 MFCC 的作业，我无法弄清楚使用这些库之间的一些区别。

我使用的 3 个库是:

python_speech_features

SpeechPy

LibROSA

samplerate = 16000
NFFT = 512
NCEPT = 13

第 1 部分:梅尔滤波器组

temp1_fb = pyspeech.get_filterbanks(nfilt=NFILT, nfft=NFFT, samplerate=sample1)
# speechpy do not divide 2 and add 1 when initializing
temp2_fb = speechpy.feature.filterbanks(num_filter=NFILT, fftpoints=NFFT, sampling_freq=sample1)
temp3_fb = librosa.filters.mel(sr=sample1, n_fft=NFFT, n_mels=NFILT)
# fix librosa normalized version
temp3_fb /= np.max(temp3_fb, axis=-1)[:, None]

pic1

Only the shape in speechpy will get (, 512), other all (, 257). The figure of librosa is a bit of deformation.

第二部分:MFCC

# pyspeech without lifter. Using hamming
temp1_mfcc = pyspeech.mfcc(speaker1, samplerate=sample1, winlen=0.025, winstep=0.01, numcep=NCEPT, nfilt=NFILT, nfft=NFFT,
                           preemph=0.97, ceplifter=0, winfunc=np.hamming, appendEnergy=False)
# speechpy need pre-emphasized. Using rectangular window fixed. Mel filter bank is not the same
temp2_mfcc = speechpy.feature.mfcc(emphasized_speaker1, sampling_frequency=sample1, frame_length=0.025, frame_stride=0.01,
                                   num_cepstral=NCEPT, num_filters=NFILT, fft_length=NFFT)
# librosa need pre-emphasized. Using log energy. Its STFT using hanning, but its framing is not the same
temp3_energy = librosa.feature.melspectrogram(emphasized_speaker1, sr=sample1, S=temp3_pow.T, n_fft=NFFT,
                                          hop_length=frame_step, n_mels=NFILT).T
temp3_energy = np.log(temp3_energy)
temp3_mfcc = librosa.feature.mfcc(emphasized_speaker1, sr=sample1, S=temp3_energy.T, n_mfcc=13, dct_type=2, n_fft=NFFT,
                                  hop_length=frame_step).T

pic2

I've tried my best to set the condition faire. The figure of speechpy gets darker.

第三部分:Delta系数

temp1 = pyspeech.delta(mfcc_speaker1, 2)
temp2 = speechpy.processing.derivative_extraction(mfcc_speaker1.T, 1).T
# librosa along the frame axis
temp3 = librosa.feature.delta(mfcc_speaker1, width=5, axis=0, order=1)

pic3

I can't directly set mfcc as argument in speechpy, or it will be very strange. And what these parameters originally act is not the same as my expected.

我想知道造成这些差异的因素是什么。这只是我上面提到的东西吗？还是我犯了一些错误？希望详细点，谢谢。

最佳答案

有许多 MFCC 实现，它们通常逐位不同 - 窗口函数形状、梅尔滤波器组计算、dct 也可能不同。很难找到一个完全兼容的库。从长远来看，只要您在任何地方使用相同的实现，这对您来说都无关紧要。差异不影响结果。

关于python - 3种python库中的MFCC和delta系数，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50924493/

python - 3种python库中的MFCC和delta系数

上一篇：python-3.x - librosa.load() 加载(样本)mp3 文件的时间太长

下一篇：angular - Electron Angular 播放静态音频文件