c# - 如何编写正确的正则表达式来获取文本?

标签 c# regex

我从服务中得到了一些 html 响应

<style> .transcription, .trsc{line-height:19px; padding-left:20px; font-family:Lucida Sans Unicode; padding-right:5px;} </style><div id="shView"> <div class="cforms_result" id="cforms_result1"> <div class="ref_cform" onclick="javascript:GetFullWordCBK('1', 'wordER');"><span class="fsform_link"><a href="javascript:;" onclick="javascript:GetFullWordCBK('1', 'wordER');"><img src="/images/common/owl_ico16.gif" width="19" height="19" border="0"></a><a href="javascript:;" onclick="javascript:GetFullWordCBK('1', 'wordER');"> Спряжение </a></span><span class="ref_source">mother<wrs><span class="sforms_src"><span class="w_des">Infinitive</span><b>mother</b><br><span class="w_des">Past Indefinite</span><b>mothered</b><br><span class="w_des">Participle II</span><b>mothered</b><br><span class="w_des">Participle I</span><b>mothering</b></span></wrs></span>&nbsp;<span class="ref_info"></span>, <span class="ref_psp">Глагол</span></div> <div class="tr_pr"><span class="transcription">[ˈmʌðə]</span><span class="pronunciation"><a href="javascript:;" class="pbf_s" id="lnkGtTr1" onclick="javascript:ListenWord(this,'mother',1,'play');"><img src="/images/common/vol_on.gif" align="absmiddle" border="0" id="imgGtTr1"><span> powered by <img src="/images/common/logoforvo.gif" width="59" height="17" border="0" hspace="5" align="absmiddle" style="cursor:point; cursor:hand;" onclick="window.open('http://ru.forvo.com/');"></span></a><span class="loadFrv" id="loadFrv1"><img hspace="10" src="/images/common/al_fullWR.gif" align="absmiddle"></span><span style="width:20px; height:17px;" class="pbf_s" id="speaker_on1"><span> powered by <img src="/images/common/logoforvo.gif" width="59" height="17" border="0" hspace="5" align="absmiddle" style="cursor:point; cursor:hand;" onclick="window.open('http://ru.forvo.com/');"></span></span></span></div> <div id="translations" onclick="javascript:GetFullWordCBK('1', 'wordER');"> <ol> <li><span class="ref_result">относиться по-матерински<wrs><span class="sforms_src"></span></wrs></span> <span class="ref_info"></span></li> </ol> </div> </div><script> $('.sforms_src').filter(function(index) { return $(this).html().length == 0;}).remove();//getPrLink('mother ');//$('#speaker_on').unbind('click','ShowFullWRefERRE')//$('#speaker_on').click(function(){alert("не открывать окно расширеной справки");}); </script><div class="cforms_result" id="cforms_result2"> <div class="ref_cform" onclick="javascript:GetFullWordCBK('2', 'wordER');"><span class="fsform_link"><a href="javascript:;" onclick="javascript:GetFullWordCBK('2', 'wordER');"><img src="/images/common/owl_ico16.gif" width="19" height="19" border="0"></a><a href="javascript:;" onclick="javascript:GetFullWordCBK('2', 'wordER');"> Склонение </a></span><span class="ref_source">mother<wrs><span class="sforms_src"><span class="w_des">Singular</span><b>mother</b><br><span class="w_des">Plural</span><b>mothers</b></span></wrs></span>&nbsp;<span class="ref_info"></span>, <span class="ref_psp">Существительное</span></div> <div class="tr_pr"><span class="transcription">[ˈmʌðə]</span><span class="pronunciation"><a href="javascript:;" class="pbf_s" id="lnkGtTr2" onclick="javascript:ListenWord(this,'mother',2,'play');"><img src="/images/common/vol_on.gif" align="absmiddle" border="0" id="imgGtTr2"><span> powered by <img src="/images/common/logoforvo.gif" width="59" height="17" border="0" hspace="5" align="absmiddle" style="cursor:point; cursor:hand;" onclick="window.open('http://ru.forvo.com/');"></span></a><span class="loadFrv" id="loadFrv2"><img hspace="10" src="/images/common/al_fullWR.gif" align="absmiddle"></span><span style="width:20px; height:17px;" class="pbf_s" id="speaker_on2"><span> powered by <img src="/images/common/logoforvo.gif" width="59" height="17" border="0" hspace="5" align="absmiddle" style="cursor:point; cursor:hand;" onclick="window.open('http://ru.forvo.com/');"></span></span></span></div> <div id="translations" onclick="javascript:GetFullWordCBK('2', 'wordER');"> <ol> <li><span class="ref_result">мать<wrs><span class="sforms_src"></span></wrs></span> <span class="ref_info">f</span></li> <li><span class="ref_result">родительский элемент<wrs><span class="sforms_src"></span></wrs></span> <span class="ref_info">m</span><span class="ref_dictionary"> (ИТ - базовый) </span></li> <li><span class="ref_result">родительский<wrs><span class="sforms_src"></span></wrs></span><span class="ref_comment"> (attributive) </span> <span class="ref_info"></span><span class="ref_dictionary"> (ИТ - базовый) </span></li> <li><span class="ref_result">прототип<wrs><span class="sforms_src"></span></wrs></span> <span class="ref_info">m</span><span class="ref_dictionary"> (Политехнический) </span></li> <li><span class="ref_result">начало<wrs><span class="sforms_src"></span></wrs></span> <span class="ref_info">n</span><span class="ref_dictionary"> (Политехнический) </span></li> </ol> </div> </div><script> $('.sforms_src').filter(function(index) { return $(this).html().length == 0;}).remove();//getPrLink('mother ');//$('#speaker_on').unbind('click','ShowFullWRefERRE')//$('#speaker_on').click(function(){alert("не открывать окно расширеной справки");}); </script><div class="cforms_result" id="cforms_result3"> <div class="ref_cform" onclick="javascript:GetFullWordCBK('3', 'wordER');"><span class="fsform_link"><a href="javascript:;" onclick="javascript:GetFullWordCBK('3', 'wordER');"><img src="/images/common/owl_ico16.gif" width="19" height="19" border="0"></a><a href="javascript:;" onclick="javascript:GetFullWordCBK('3', 'wordER');"> Склонение </a></span><span class="ref_source">mother<wrs><span class="sforms_src"><span class="w_des">Positive</span><b>mother</b><br></span></wrs></span>&nbsp;<span class="ref_info"></span>, <span class="ref_psp">Прилагательное</span></div> <div class="tr_pr"><span class="transcription">[ˈmʌðə]</span><span class="pronunciation"><a href="javascript:;" class="pbf_s" id="lnkGtTr3" onclick="javascript:ListenWord(this,'mother',3,'play');"><img src="/images/common/vol_on.gif" align="absmiddle" border="0" id="imgGtTr3"><span> powered by <img src="/images/common/logoforvo.gif" width="59" height="17" border="0" hspace="5" align="absmiddle" style="cursor:point; cursor:hand;" onclick="window.open('http://ru.forvo.com/');"></span></a><span class="loadFrv" id="loadFrv3"><img hspace="10" src="/images/common/al_fullWR.gif" align="absmiddle"></span><span style="width:20px; height:17px;" class="pbf_s" id="speaker_on3"><span> powered by <img src="/images/common/logoforvo.gif" width="59" height="17" border="0" hspace="5" align="absmiddle" style="cursor:point; cursor:hand;" onclick="window.open('http://ru.forvo.com/');"></span></span></span></div> <div id="translations" onclick="javascript:GetFullWordCBK('3', 'wordER');"> <ol> <li><span class="ref_result">родительский<wrs><span class="sforms_src"></span></wrs></span> <span class="ref_info"></span><span class="ref_dictionary"> (ИТ - базовый) </span></li> </ol> </div> </div><script> $('.sforms_src').filter(function(index) { return $(this).html().length == 0;}).remove();//getPrLink('mother ');//$('#speaker_on').unbind('click','ShowFullWRefERRE')//$('#speaker_on').click(function(){alert("не открывать окно расширеной справки");}); </script><div id="fullRLink"><a href="javascript:GetFullWordCBK('1', 'wordER');">Показать полную словарную статью</a><span id="al_fullWR"><img src="/images/common/al_fullWR.gif" align="middle" hspace="10"> Загружаем...</span></div></div>

我想获取此模式 <span class="ref_result">TEXT<wrs> 之间的文本

我使用此代码来获取所有匹配

const string pattern = "ref_result\">\\w+<";
Regex rgx = new Regex(pattern, RegexOptions.Compiled);
var text = SantinizeOutput(result.result);
MatchCollection matches = rgx.Matches(text);
if(matches.Count > 0)
{
  resultsList = new List<string>(matches.Count);
  foreach(Match match in rgx.Matches(text))
  {
    string formattedWord = match.Value;
    int leftAngleBracketIndex = formattedWord.IndexOf(">");
    var word = formattedWord.Remove(0, leftAngleBracketIndex + 1);
    word = word.TrimEnd('<');
    resultsList.Add(word);
  }
}


private string SantinizeOutput(string input)
{
  var text = input.Replace("\n", "").Replace("\r", "");
  return Regex.Replace(text, "\\s+", " ");
}

在本文中,有 7 个这样的匹配,但结果只有 5 个。

我哪里出错了?

最佳答案

\w表示“单词字符”;它与空格不匹配。观察 ref_result 中的两个标签包含空格:

<span class="ref_result">относиться по-матерински<wrs>
<span class="ref_result">родительский элемент<wrs>

只需使用 "ref_result\">[^<]+<wrs"获取所有非标签内容。

关于c# - 如何编写正确的正则表达式来获取文本?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12654993/

相关文章:

c# - WPF 的 Windows 8 现代 UI 样式

c# - 创建返回多条错误消息的自定义 ValidationAttribute 类

c# - 在不遍历所有作业的情况下检查作业是否存在

regex - 使用 perl 提取电子邮件的一部分

c# - 如果 web.config 丢失或损坏,防止 YSOD?

c# - 使用 AutoMapper 添加、更新和删除列表中的项目

python - 在python中合并带连字符的单词

javascript - 正则表达式捕获尽可能多的 Javascript 重定向

c# - 匹配 wwR ("acddca” ) 在 C# 中使用正则表达式示例

javascript - 正则表达式: Extract Dates From String