我想使用 c# 从长单词摘要(纯字符串或 html)中获取前几个单词(100 或 200)。
我的要求是显示内容长摘要的简短描述(该内容可能包括html元素)。我能够检索纯字符串,但当它是 html 时,元素会在示例之间剪切,我会这样
<span style="FONT-FAMILY: Trebuchet MS">Heading</span>
</H3><span style="FONT-FAMILY: Trebuchet MS">
<font style="FONT-SIZE: 15px;
但它应该返回包含完整 html 元素的字符串。
我有一个 Yahoo UI 编辑器来从用户那里获取内容,并且我将该文本传递给下面的方法来获取简短的摘要,
public static string GetFirstFewWords(string input, int numberWords)
{
if (input.Split(new char[] { ' ' },
StringSplitOptions.RemoveEmptyEntries).Length > numberWords)
{
// Number of words we still want to display.
int words = numberWords;
// Loop through entire summary.
for (int i = 0; i < input.Length; i++)
{
// Increment words on a space.
if (input[i] == ' ')
{
words--;
}
// If we have no more words to display, return the substring.
if (words == 0)
{
return input.Substring(0, i);
}
}
return string.Empty;
}
else
{
return input;
}
}
我正在尝试从用户那里获取文章内容并在列表页面上显示简短的摘要。
最佳答案
想到有Html Agility Pack听从你的吩咐吗?
虽然并不完美,但这里有一个想法可以(或多或少)实现您所追求的目标:
// retrieve a summary of html, with no less than 'max' words
string GetSummary(string html, int max)
{
string summaryHtml = string.Empty;
// load our html document
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
int wordCount = 0;
foreach (var element in htmlDoc.DocumentNode.ChildNodes)
{
// inner text will strip out all html, and give us plain text
string elementText = element.InnerText;
// we split by space to get all the words in this element
string[] elementWords = elementText.Split(new char[] { ' ' });
// and if we haven't used too many words ...
if (wordCount <= max)
{
// add the *outer* HTML (which will have proper
// html formatting for this fragment) to the summary
summaryHtml += element.OuterHtml;
wordCount += elementWords.Count() + 1;
}
else
{
break;
}
}
return summaryHtml;
}
关于c# - 从长摘要中获取前几个单词(纯字符串或 HTML),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1577361/