audio - tensorflows STFT函数的正确使用

我正在尝试构建一个类似于使用 Audacity 创建的音频样本的绘图频谱。从 Audacity 的 wiki 页面，情节频谱(附加示例)执行:

Plot Spectrum take the audio in blocks of 'Size' samples, does the FFT, and averages all the blocks together.

我在想我会使用 Tensorflow 最近提供的 STFT 功能。

我正在使用大小为 512 的音频块，我的代码如下:

audio_binary = tf.read_file(audio_file)
waveform = tf.contrib.ffmpeg.decode_audio(
    audio_binary,
    file_format="wav",
    samples_per_second=4000,
    channel_count=1
)

stft = tf.contrib.signal.stft(
    waveform,
    512,     # frame_length
    512,     # frame_step
    fft_length=512,
    window_fn=functools.partial(tf.contrib.signal.hann_window, periodic=True), # matches audacity
    pad_end=True,
    name="STFT"
)

但是当我期望每帧(512 个样本)的 FFT 结果时，stft 的结果只是一个空数组

我打这个电话的方式有什么问题？

我已经验证波形音频数据可以通过常规 tf.fft 正确读取。功能。

最佳答案

audio_file = tf.placeholder(tf.string)

audio_binary = tf.read_file(audio_file)
waveform = tf.contrib.ffmpeg.decode_audio(
    audio_binary,
    file_format="wav",
    samples_per_second=sample_rate,    # Get Info on .wav files (sample rate)
    channel_count=1             # Get Info on .wav files (audio channels)
)

stft = tf.contrib.signal.stft(
    tf.transpose(waveform),
    frame_length,     # frame_lenght, hmmm
    frame_step,     # frame_step, more hmms
    fft_length=fft_length,
    window_fn=functools.partial(tf.contrib.signal.hann_window, 
            periodic=False), # matches audacity
    pad_end=False,
    name="STFT"
)

关于audio - tensorflows STFT函数的正确使用，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45903477/

上一篇：javafx - 如何在 SplitPane JavaFX 中锁定分隔线？

下一篇：yii2 - 检索 Controller / Action 数组

python - 实时对象检测没有名为 'tensorflow.compat.v1' 的模块

python - keras v1.2.2 与 keras v2+ 的奇怪行为(准确度存在巨大差异)

python - 使用 fastT5 将 T5 模型导出到 onnx 时，得到 "RuntimeError:output with shape [5, 8, 1, 2] doesn' t 匹配广播形状 [5, 8, 2, 2]"

c# - 检测音频文件中的小峰值

javascript - 如何使用 JS/jQuery 在 HTML5 中使用 <audio> 标签播放一首又一首轨道？

c - 通过pthread create调用时，通过alsa输出声音的功能不起作用:无声音，CPU使用率100％

python - 如何在 Tensorflow 2.0 中获得其他指标(不仅仅是准确性)？

ios - 使用 AudioServicesPlaySystemSound 一次播放 1 个声音

java - Java OpenIMAJ .ogg解码器-第一个缓冲区将全零解码