audio - Keras 中声音数据的自动编码器

标签 audio keras autoencoder

我有 5 个不同类别的声音样本的二维对数标度梅尔频谱图。

为了训练,我在 Keras 中使用了卷积神经网络和密集神经网络。这里的代码:

model = Sequential()
model.add(Conv1D(80, 8, activation='relu', padding='same',input_shape=(60,108)))
model.add(MaxPooling1D(2,padding='same',strides=None))
model.add(Flatten())
initializer=initializers.TruncatedNormal()
model.add(Dense(200, activation='relu', kernel_initializer=initializer,bias_initializer=initializer))
model.add(BatchNormalization())
model.add(Dropout(0.8))
model.add(Dense(50, activation='relu', kernel_initializer=initializer,bias_initializer=initializer))
model.add(Dropout(0.8))
model.add(Dense(5, activation='softmax', kernel_initializer=initializer,bias_initializer=initializer))
model.compile(loss='categorical_crossentropy',
          optimizer='adam',lr=0.01,
          metrics=['accuracy'])

我可以将哪种自动编码器应用于这种类型的数据输入?什么型号?任何建议或代码示例都会有所帮助。 :)

最佳答案

由于我没有关于数据性质的问题的答案,我假设我们有一组二维数据,其形状类似于 (NSamples, 68, 108)。此外,我假设对我使用 Convolutional2D 而不是 Convolutional1D 的建议的回答是肯定的

这里是卷积自动编码器的模型示例,模型,它可以使用经过训练的自动编码器以及如何将来自自动编码器的权重用于最终模型:

from keras.layers.core import Dense, Dropout, Flatten, Reshape
from keras.layers import Conv1D, Conv2D, Deconv2D, MaxPooling1D, MaxPooling2D, UpSampling2D, Conv2DTranspose, Flatten, BatchNormalization, Dropout
from keras.callbacks import ModelCheckpoint
import keras.models as models
import keras.initializers as initializers
from sklearn.model_selection import train_test_split

ae = models.Sequential()
#model.add(Conv1D(80, 8, activation='relu', padding='same',input_shape=(60,108)))
#encoder
c = Conv2D(80, 3, activation='relu', padding='same',input_shape=(60, 108, 1))
ae.add(c)
ae.add(MaxPooling2D(pool_size=(2, 2), padding='same', strides=None))
ae.add(Flatten())
initializer=initializers.TruncatedNormal()
d1 = Dense(200, activation='relu', kernel_initializer=initializer,bias_initializer=initializer)
ae.add(d1)
ae.add(BatchNormalization())
ae.add(Dropout(0.8))
d2 = Dense(50, activation='relu', kernel_initializer=initializer,bias_initializer=initializer)
ae.add(d2)
ae.add(Dropout(0.8))
#decodser
ae.add(Dense(d2.input_shape[1], activation='sigmoid'))
ae.add(Dense(d1.input_shape[1], activation='sigmoid'))
ae.add(Reshape((30, 54, 80)))
ae.add(UpSampling2D((2,2)))
ae.add(Deconv2D(filters= c.filters, kernel_size= c.kernel_size, strides=c.strides, activation=c.activation, padding=c.padding, ))
ae.add(Deconv2D(filters= 1, kernel_size= c.kernel_size, strides=c.strides, activation=c.activation, padding=c.padding, ))
ae.compile(loss='binary_crossentropy',
optimizer='adam',lr=0.001,
metrics=['accuracy'])
ae.summary()
#now train your convolutional autoencoder to reconstruct your input data
#reshape your data to (NSamples, 60, 108, 1)
#Then train your autoencoder. it can be something like that:
#X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=43)
#pre_mcp = ModelCheckpoint("CAE.hdf5", monitor='val_accuracy', verbose=2, save_best_only=True, mode='max')
#pre_history = ae.fit(X_train, X_train, epochs=100, validation_data=(X_val, X_val), batch_size=22, verbose=2, callbacks=[pre_mcp])

#model
model = models.Sequential()
#model.add(Conv1D(80, 8, activation='relu', padding='same',input_shape=(60,108)))
model.add(Conv2D(80, 3, activation='relu', padding='same',input_shape=(60, 108, 1)))
model.add(MaxPooling2D(pool_size=(2, 2), padding='same',strides=None))
model.add(Flatten())
initializer=initializers.TruncatedNormal()
model.add(Dense(200, activation='relu', kernel_initializer=initializer,bias_initializer=initializer))
model.add(BatchNormalization())
model.add(Dropout(0.8))
model.add(Dense(50, activation='relu', kernel_initializer=initializer,bias_initializer=initializer))
model.add(Dropout(0.8))
model.add(Dense(5, activation='softmax', kernel_initializer=initializer,bias_initializer=initializer))
model.compile(loss='categorical_crossentropy',
optimizer='adam',lr=0.001,
metrics=['accuracy'])
#Set weights              
model.layers[0].set_weights(ae.layers[0].get_weights())       
model.layers[3].set_weights(ae.layers[3].get_weights())  
model.layers[4].set_weights(ae.layers[4].get_weights())  
model.layers[6].set_weights(ae.layers[6].get_weights())  
model.summary()
#Now you can train your model with pre-trained weights from autoencoder

这样的模型对我使用 MNIST 数据集很有用,并且与使用随机权重初始化的模型相比,使用自动编码器的初始权重提高了模型的准确性

但是,我建议使用多个卷积/反卷积层,可能是 3 个或更多,因为根据我的经验,具有 3 个或更多卷积层的卷积自动编码器比具有 1 个卷积层的效率更高。事实上,对于一个卷积层,有时我什至看不到任何精度提升

更新:

我用 Emanuela 提供的数据检查了自动编码器,我也用不同的自动编码器架构检查了它,但没有成功

我对此的假设是数据不包含任何重要特征,这些特征可以通过自动编码器甚至 CAE 来区分

但是,看起来我关于数据的二维性质的假设已通过达到近 99.99% 的验证准确度得到证实: enter image description here

然而,与此同时,97.31% 的训练数据准确率可以表明数据集存在潜在问题,因此修改它看起来是个好主意

此外,我建议使用网络集成。例如,您可以训练 10 个具有不同验证数据的网络,并按投票最多的类别为项目分配一个类别

这是我的代码:

from keras.layers.core import Dense, Dropout, Flatten
from keras.layers import Conv2D, BatchNormalization
from keras.callbacks import ModelCheckpoint
from keras.optimizers import Adam
from sklearn.model_selection import train_test_split
import keras.models as models
import keras.initializers as initializers
import msgpack
import numpy as np

with open('SoundDataX.msg', "rb") as fx,open('SoundDataY.msg', "rb") as fy: 
    dataX=msgpack.load(fx) 
    dataY=msgpack.load(fy)

num_samples = len(dataX)
x = np.empty((num_samples, 60, 108, 1), dtype = np.float32)
y = np.empty((num_samples, 4), dtype = np.float32)

for i in range(0, num_samples):
    x[i] = np.asanyarray(dataX[i]).reshape(60, 108, 1)
    y[i] = np.asanyarray(dataY[i])

X_train, X_val, y_train, y_val = train_test_split(x, y, test_size=0.2, random_state=43)

#model
model = models.Sequential()
model.add(Conv2D(128, 3, activation='relu', padding='same',input_shape=(60, 108, 1)))
model.add(Conv2D(128, 5, activation='relu', padding='same',input_shape=(60, 108, 1)))
model.add(Conv2D(128, 7, activation='relu', padding='same',input_shape=(60, 108, 1)))
model.add(Flatten())
initializer=initializers.TruncatedNormal()
model.add(Dense(200, activation='relu', kernel_initializer=initializer,bias_initializer=initializer))
model.add(BatchNormalization())
model.add(Dropout(0.8))
model.add(Dense(50, activation='relu', kernel_initializer=initializer,bias_initializer=initializer))
model.add(Dropout(0.8))
model.add(Dense(4, activation='softmax', kernel_initializer=initializer,bias_initializer=initializer))
model.compile(loss='categorical_crossentropy',
optimizer=Adam(lr=0.0001),
metrics=['accuracy'])
model.summary()
filepath="weights-{epoch:02d}-{val_acc:.7f}-{acc:.7f}.hdf5"
mcp = ModelCheckpoint(filepath, monitor='val_acc', verbose=2, save_best_only=True, mode='max')
history = model.fit(X_train, y_train, epochs=100, validation_data=(X_val, y_val), batch_size=64, verbose=2, callbacks=[mcp])

关于audio - Keras 中声音数据的自动编码器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46816206/

相关文章:

python-2.7 - 从文本文件生成波形声音

python - Keras 使用 ImageDataGenerator 将图像数据缩放到 -1 到 1 之间的值

python - 构建卷积自动编码器时的尺寸错误

audio - 电晕sdk:当对象在指定区域中时播放声音

actionscript-3 - 全局类的正确的面向对象结构

java - android 多个媒体播放器声音崩溃

python - 如何清除使用 Keras 和 Tensorflow(作为后端)创建的模型?

python - ImageDataGenerator流函数的正确使用

machine-learning - 深度置信网络与卷积神经网络

keras - 在 Keras 中使用 LSTM 对时间序列进行变分自动编码器