我尝试过自己完成此操作,但失败了。
我有一个 html 文档,我正在尝试将其中所有图片的地址提取到一个 C# 集合中,但我不确定语法。我正在使用 HTMLAgilityPack...这是我到目前为止所拥有的。请指教。
HTML 代码如下:
<div style='padding-left:12px;' id='myWeb123'>
<b>MyWebSite Pics</b>
<br /><br />
<img src="http://myWebSite.com/pics/HHTR_01.jpg" alt='myWebSitePics' title='myWebSitePics' /><br /><br />
<img src="http://myWebSite.com/pics/HHTR_02.jpg" alt='myWebSitePics' title='myWebSitePics' /><br /><br />
<img src="http://myWebSite.com/pics/HHTR_03.jpg" alt='myWebSitePics' title='myWebSitePics' /><br /><br />
<img src="http://myWebSite.com/pics/HHTR_04.jpg" alt='myWebSitePics' title='myWebSitePics' /><br /><br />
<img src="http://myWebSite.com/pics/HHTR_05.jpg" alt='myWebSitePics' title='myWebSitePics' /><br /><br />
<img src="http://myWebSite.com/pics/HHTR_06.jpg" alt='myWebSitePics' title='myWebSitePics' /><br /><br />
<img src="http://myWebSite.com/pics/HHTR_07.jpg" alt='myWebSitePics' title='myWebSitePics' /><br /><br />
<img src="http://myWebSite.com/pics/HHTR_08.jpg" alt='myWebSitePics' title='myWebSitePics' /><br /><br />
<img src="http://myWebSite.com/pics/HHTR_09.jpg" alt='myWebSitePics' title='myWebSitePics' /><br /><br />
<img src="http://myWebSite.com/pics/HHTR_10.jpg" alt='myWebSitePics' title='myWebSitePics' /><br /><br />
<a href="http://www.myWebSite.com/" target="_blank" rel="nofollow">Source</a>
</div>
C# 代码如下:
HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
document.Load("FileName.html");
// Targets a specific node
HtmlNode someNode = document.GetElementbyId("myWeb123");
//HtmlNodeCollection linkNodes = document.DocumentNode.SelectNodes("//a[@href]");
HtmlNodeCollection linkNodes = document.DocumentNode.SelectNodes("//div[@id='myWeb123']");
if (linkNodes != null)
{
int count = 0;
foreach(HtmlNode linkNode in linkNodes)
{
string linkTitle = linkNode.GetAttributeValue("src", string.Empty);
Debug.Print("linkTitle = " + linkTitle);
if (linkTitle == string.Empty)
{
HtmlNode imageNode = linkNode.SelectSingleNode("img[@alt]");
if (imageNode != null)
{
Debug.Print("imageNode = " + imageNode.Attributes.ToString());
}
}
count++;
Debug.Print("count = " + count);
}
}
我尝试使用 HtmlAgilityPack 文档,但该包缺乏示例,并且如果没有示例,有关其方法和类的信息对我来说真的很难理解。
最佳答案
试试这个,抱歉,如果它无法构建,我已经根据您的情况覆盖了我们的代码
List<string> result = new List<string>();
foreach (HtmlNode link in document.DocumentNode.SelectNodes("//img[@src]"))
{
HtmlAttribute att = link.Attributes["src"];
string temp = att.Value;
string urlValue;
do
{
urlValue = temp;
temp = HttpUtility.UrlDecode(HttpUtility.HtmlDecode(urlValue));
} while (temp != urlValue);
result.Add(temp);
}
关于c# - 鉴于我将网页源存储在字符串变量中,如何在 C# 中读取 HTML 文档?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8266607/