c# - C#如何获取单词对应的发音音素?

标签 c# text-to-speech microsoft-speech-api

首先我要声明,我是 C# 编程的新手。我正在开发一个应用程序,使用 C# 结合 SAPI v5.4 ( speechlib ) 以编程方式修改 Windows 语音词典。到目前为止一切都运行良好,但我需要更深入地了解合成(发声)时如何解释字符串。

我的理解是,在 SAPI 5.4 中,单词被分解为 phoneme representations ,并且我成功地使用音素正确地“训练”了单词发音。我也知道我可以手动将单词添加到 Windows 语音识别词典,提供录音,然后提取单词的发音(音素)……但这很麻烦。探索单词在默认情况下是如何合成的也很有用,即没有我的输入(比如合成器如何解释“海豚”?)。

从编码的角度来看,这是我到目前为止所得到的:

using System;
using System.Speech.Synthesis;

namespace SpeechTest
{
    class Program
    {
        static void Main(string[] args)
        {
            // Set up the speech synthesizer
            SpeechSynthesizer synthesizer = new SpeechSynthesizer();
            synthesizer.Volume = 100;
            synthesizer.Rate = -2;

            // Configure the audio output 
            synthesizer.SetOutputToDefaultAudioDevice();

            // Initialize string to store word of interest (not in the speech dictionary)
            string myWord = "dolphins";

            // Speak the word of interest
            synthesizer.Speak(myWord);

            // Retrieve pronunciation of myWord
            string myPronunciation = // *some code here*

            Console.WriteLine("Press any key to exit...");
            Console.ReadLine();
        }
    }
}

最佳答案

感谢 Casey Chesnut 的出色工作我已经弄清楚如何确定给定字符串的 IPA 音素。现在我只需要弄清楚如何将 IPA 音素转换为 SAPI 符号,但这是一个单独的主题(有关如何从文本字符串获取 SAPI 音素,请参阅 here)。

using System;
using System.Collections.ObjectModel;
using System.ComponentModel;
using System.IO;
using System.Speech.Recognition;
using System.Speech.Synthesis;
using System.Windows.Forms;

namespace SpeechTest
{
    class Program
    {
        static void Main(string[] args)
        {
            string MyText = "dolphins"; // Initialze string for storing word (or words) of interest
            string MyPronunciation = GetPronunciationFromText(MyText.Trim()); // Get IPA pronunciations of MyTe
            MessageBox.Show(MyText + " = " + MyPronunciation); // Output MyText and MyPronunciation
        }

        public static string recoPhonemes;

        public static string GetPronunciationFromText(string MyWord)
        {
            //this is a trick to figure out phonemes used by synthesis engine

            //txt to wav
            using (MemoryStream audioStream = new MemoryStream())
            {
                using (SpeechSynthesizer synth = new SpeechSynthesizer())
                {
                    synth.SetOutputToWaveStream(audioStream);
                    PromptBuilder pb = new PromptBuilder();
                    //pb.AppendBreak(PromptBreak.ExtraSmall); //'e' wont be recognized if this is large, or non-existent?
                    //synth.Speak(pb);
                    synth.Speak(MyWord);
                    //synth.Speak(pb);
                    synth.SetOutputToNull();
                    audioStream.Position = 0;

                    //now wav to txt (for reco phonemes)
                    recoPhonemes = String.Empty;
                    GrammarBuilder gb = new GrammarBuilder(MyWord);
                    Grammar g = new Grammar(gb); //TODO the hard letters to recognize are 'g' and 'e'
                    SpeechRecognitionEngine reco = new SpeechRecognitionEngine();
                    reco.SpeechHypothesized += new EventHandler<SpeechHypothesizedEventArgs>(reco_SpeechHypothesized);
                    reco.SpeechRecognitionRejected += new EventHandler<SpeechRecognitionRejectedEventArgs>(reco_SpeechRecognitionRejected);
                    reco.UnloadAllGrammars(); //only use the one word grammar
                    reco.LoadGrammar(g);
                    reco.SetInputToWaveStream(audioStream);
                    RecognitionResult rr = reco.Recognize();
                    reco.SetInputToNull();
                    if (rr != null)
                    {
                        recoPhonemes = StringFromWordArray(rr.Words, WordType.Pronunciation);
                    }
                    //txtRecoPho.Text = recoPhonemes;
                    return recoPhonemes;
                }
            }
        }

        public static string StringFromWordArray(ReadOnlyCollection<RecognizedWordUnit> words, WordType type)
        {
            string text = "";
            foreach (RecognizedWordUnit word in words)
            {
                string wordText = "";
                if (type == WordType.Text || type == WordType.Normalized)
                {
                    wordText = word.Text;
                }
                else if (type == WordType.Lexical)
                {
                    wordText = word.LexicalForm;
                }
                else if (type == WordType.Pronunciation)
                {
                    wordText = word.Pronunciation;
                    //MessageBox.Show(word.LexicalForm);
                }
                else
                {
                    throw new InvalidEnumArgumentException(String.Format("[0}: is not a valid input", type));
                }
                //Use display attribute

                if ((word.DisplayAttributes & DisplayAttributes.OneTrailingSpace) != 0)
                {
                    wordText += " ";
                }
                if ((word.DisplayAttributes & DisplayAttributes.TwoTrailingSpaces) != 0)
                {
                    wordText += "  ";
                }
                if ((word.DisplayAttributes & DisplayAttributes.ConsumeLeadingSpaces) != 0)
                {
                    wordText = wordText.TrimStart();
                }
                if ((word.DisplayAttributes & DisplayAttributes.ZeroTrailingSpaces) != 0)
                {
                    wordText = wordText.TrimEnd();
                }

                text += wordText;

            }
            return text;
        }

        public static void reco_SpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
        {
            recoPhonemes = StringFromWordArray(e.Result.Words, WordType.Pronunciation);
        }

        public static void reco_SpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e)
        {
            recoPhonemes = StringFromWordArray(e.Result.Words, WordType.Pronunciation);
        }

    }

    public enum WordType
    {
        Text,
        Normalized = Text,
        Lexical,
        Pronunciation
    }
}

// Credit for method of retrieving IPA pronunciation from a string goes to Casey Chesnut (http://www.mperfect.net/speechSamples/)

关于c# - C#如何获取单词对应的发音音素?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49519428/

相关文章:

java - 在微调器中显示 TTS 可用语言

java - 如何为Android文本转语音制作暂停按钮?

c# - 从 HRESULT : 0x8004503A in Speechlib 获取异常

.net - 微软语音识别 : Alternate results with confidence score?

c# - FluentValidation:仅验证已更改的属性

c# - html 表单发布到 mvc Controller

azure - 将 Azure Bot 与 Azure 语音服务集成

speech-recognition - Azure 认知服务语音转文本大/长音频文件示例

c# - httpClient.PutAsync() 未更新,415 不支持的媒体类型

c# - 为什么 C# WebBrowser 中的 Url 值始终为 null?