c# - SpeechSynthesizer 的 SpeakProgressEventArgs 是否不准确?

标签 c# .net-3.5 text-to-speech speechsynthesizer

使用 .Net 3.5 中的 System.Speech.Synthesis.SpeechSynthesizer 类,SpeakProgressEventArgs 的 AudioPosition 属性似乎不准确。

以下代码产生以下输出:

代码:

using System;
using System.Speech.Synthesis;
using System.Threading;

namespace SpeechTest
{
    class Program
    {
        static ManualResetEvent speechDoneEvent = new ManualResetEvent(false);

        static void Main(string[] args)
        {
            SpeechSynthesizer synthesizer = new SpeechSynthesizer();

            synthesizer.SpeakProgress += new EventHandler<SpeakProgressEventArgs>(synthesizer_SpeakProgress);

            synthesizer.SpeakCompleted += new EventHandler<SpeakCompletedEventArgs>(synthesizer_SpeakCompleted);

            synthesizer.SetOutputToWaveFile("Test.wav");

            synthesizer.SpeakAsync("This holiday season, support the music you love by shopping at Made in Washington, online and at one of five local stores. Made in Washington chocolates, bountiful gift baskets and ornaments are the perfect holiday gifts for family, friends and co-workers.");

            speechDoneEvent.WaitOne();
        }

        static void synthesizer_SpeakCompleted(object sender, SpeakCompletedEventArgs e)
        {
            speechDoneEvent.Set();
        }

        static void synthesizer_SpeakProgress(object sender, SpeakProgressEventArgs e)
        {
            Console.WriteLine("SpeakProgress: AudioPosition=" + e.AudioPosition + ",\tCharacterPosition=" + e.CharacterPosition + ",\tCharacterCount=" + e.CharacterCount + ",\tText=" + e.Text);
        }
    }
}

输出:

SpeakProgress: AudioPosition=00:00:00.0043750,  CharacterPosition=0,    CharacterCount=4,       Text=This
SpeakProgress: AudioPosition=00:00:00.2925625,  CharacterPosition=5,    CharacterCount=7,       Text=holiday
SpeakProgress: AudioPosition=00:00:00.9086250,  CharacterPosition=13,   CharacterCount=6,       Text=season
SpeakProgress: AudioPosition=00:00:01.9421250,  CharacterPosition=21,   CharacterCount=7,       Text=support
SpeakProgress: AudioPosition=00:00:02.5621250,  CharacterPosition=29,   CharacterCount=3,       Text=the
SpeakProgress: AudioPosition=00:00:02.6760625,  CharacterPosition=33,   CharacterCount=5,       Text=music
SpeakProgress: AudioPosition=00:00:03.2648125,  CharacterPosition=39,   CharacterCount=3,       Text=you
SpeakProgress: AudioPosition=00:00:03.5199375,  CharacterPosition=43,   CharacterCount=4,       Text=love
SpeakProgress: AudioPosition=00:00:03.8435625,  CharacterPosition=48,   CharacterCount=2,       Text=by
SpeakProgress: AudioPosition=00:00:04.0701875,  CharacterPosition=51,   CharacterCount=8,       Text=shopping
SpeakProgress: AudioPosition=00:00:04.6840625,  CharacterPosition=60,   CharacterCount=2,       Text=at
SpeakProgress: AudioPosition=00:00:04.8036250,  CharacterPosition=63,   CharacterCount=4,       Text=Made
SpeakProgress: AudioPosition=00:00:05.0698125,  CharacterPosition=68,   CharacterCount=2,       Text=in
SpeakProgress: AudioPosition=00:00:05.2521250,  CharacterPosition=71,   CharacterCount=10,      Text=Washington
SpeakProgress: AudioPosition=00:00:06.2961875,  CharacterPosition=83,   CharacterCount=6,       Text=online
SpeakProgress: AudioPosition=00:00:07.0540625,  CharacterPosition=90,   CharacterCount=3,       Text=and
SpeakProgress: AudioPosition=00:00:07.3331250,  CharacterPosition=94,   CharacterCount=2,       Text=at
SpeakProgress: AudioPosition=00:00:07.6818750,  CharacterPosition=97,   CharacterCount=3,       Text=one
SpeakProgress: AudioPosition=00:00:08.0598750,  CharacterPosition=101,  CharacterCount=2,       Text=of
SpeakProgress: AudioPosition=00:00:08.2163750,  CharacterPosition=104,  CharacterCount=4,       Text=five
SpeakProgress: AudioPosition=00:00:08.5971875,  CharacterPosition=109,  CharacterCount=5,       Text=local
SpeakProgress: AudioPosition=00:00:09.0243750,  CharacterPosition=115,  CharacterCount=6,       Text=stores
SpeakProgress: AudioPosition=00:00:10.5325625,  CharacterPosition=123,  CharacterCount=4,       Text=Made
SpeakProgress: AudioPosition=00:00:10.7700625,  CharacterPosition=128,  CharacterCount=2,       Text=in
SpeakProgress: AudioPosition=00:00:10.9377500,  CharacterPosition=131,  CharacterCount=10,      Text=Washington
SpeakProgress: AudioPosition=00:00:11.6708125,  CharacterPosition=142,  CharacterCount=10,      Text=chocolates
SpeakProgress: AudioPosition=00:00:12.9798750,  CharacterPosition=154,  CharacterCount=9,       Text=bountiful
SpeakProgress: AudioPosition=00:00:13.6303125,  CharacterPosition=164,  CharacterCount=4,       Text=gift
SpeakProgress: AudioPosition=00:00:14.0959375,  CharacterPosition=169,  CharacterCount=7,       Text=baskets
SpeakProgress: AudioPosition=00:00:14.7848125,  CharacterPosition=177,  CharacterCount=3,       Text=and
SpeakProgress: AudioPosition=00:00:15.0507500,  CharacterPosition=181,  CharacterCount=9,       Text=ornaments
SpeakProgress: AudioPosition=00:00:15.7195000,  CharacterPosition=191,  CharacterCount=3,       Text=are
SpeakProgress: AudioPosition=00:00:15.9872500,  CharacterPosition=195,  CharacterCount=3,       Text=the
SpeakProgress: AudioPosition=00:00:16.1488750,  CharacterPosition=199,  CharacterCount=7,       Text=perfect
SpeakProgress: AudioPosition=00:00:16.7275000,  CharacterPosition=207,  CharacterCount=7,       Text=holiday
SpeakProgress: AudioPosition=00:00:17.3336875,  CharacterPosition=215,  CharacterCount=5,       Text=gifts
SpeakProgress: AudioPosition=00:00:17.9813125,  CharacterPosition=221,  CharacterCount=3,       Text=for
SpeakProgress: AudioPosition=00:00:18.2216875,  CharacterPosition=225,  CharacterCount=6,       Text=family
SpeakProgress: AudioPosition=00:00:19.0973750,  CharacterPosition=233,  CharacterCount=7,       Text=friends
SpeakProgress: AudioPosition=00:00:19.7726250,  CharacterPosition=241,  CharacterCount=3,       Text=and
SpeakProgress: AudioPosition=00:00:19.9655625,  CharacterPosition=245,  CharacterCount=10,      Text=co-workers
SpeakProgress: AudioPosition=00:00:20.2518750,  CharacterPosition=245,  CharacterCount=10,      Text=co-workers

但是,生成的 .wav 文件的持续时间为 15.69 秒。如果您输出到 Stream 或 null,则会发生相同的行为。

documentation对于该属性,表示该属性是“一个 TimeSpan 对象,表示事件在音频输出流中的时间位置”。

它应该是一个准确的时间,表示输出文件中单词开始或结束的时间,还是我误解了它?

最佳答案

audioPosition 取决于所选语音合成器的声音。对于某些 Microsoft 语音,如 Anna、Zira、David、Hazel,正如我所体验的,支持的音频格式是 16000Hz PCM。所以下面的方案可以修正auido的位置:

var format = 
new System.Speech.AudioFormat.SpeechAudioFormatInfo(EncodingFormat.Pcm, 
                                                    16000, 16, 1, 32000, 2, null);
synthesizer.SetOutputToWaveFile("Test.wav", format);

如果您注意到,SetOutputToWaveFile 的默认采样率为 22050,正确时间 (15.69) 与 AudipPosition 显示的时间 (20.25) 的比率约为 0.77。如果将此比率乘以 22050,您将得到大约 16000,这是正确的采样率。

关于c# - SpeechSynthesizer 的 SpeakProgressEventArgs 是否不准确?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1718967/

相关文章:

c# - 列出 .net assembly 名称

c# - 如何将 int[] 连接到 .NET 中的字符分隔字符串?

text-to-speech - 为文本到语音的标准文本减慢 Twilio 的 TwiML “Say” 命令

javascript - 使用 Web Speech API 的 SpeechSynthesis 接口(interface)改善发音

text-to-speech - 是否有接受基于 IPA 的音标的文本转语音软件?

c# - Visual Studio 探查器不显示类名

c# - SortedSet<T> 需要使用不同的排序和相等标准

c# - C# 中的栈和堆

c# - 解析 XML 数据时出错

c# - 在 Linq To SQl .net 3.5 中调用函数