我在解析外部 html 文件时遇到对象引用错误,我认为这是因为并非所有选定的元素都有类名。这是我的代码:
foreach (HtmlNode link in doc.DocumentNode.Descendants("li").Where(i => i.Attributes["class"].Value == "name"))
{
string result = link.InnerText.Trim().Replace(" ", "");
Console.WriteLine(result);
}
如何仅选择类名称为“name”的值?
这是我正在尝试解析的 html 代码:
<li>
<span class="name">
<a href="/players/joe-bloggs.html">Joe, Bloggs</a>
</span>
<span class="country">
<img src="/img/flags/15x15/USA.gif" alt="USA"/>
United States
</span>
</li>
<li>
<span class="name">
<a href="/players/joe-bloggs.html">Joe, Bloggs</a>
</span>
<span class="country">
<img src="/img/flags/15x15/USA.gif" alt="USA"/>
United States
</span>
</li>
<li>
<span class="name">
<a href="/players/joe-bloggs.html">Joe, Bloggs</a>
</span>
<span class="country">
<img src="/img/flags/15x15/RSA.gif" alt="RSA"/>
South Africa
</span>
</li>
最佳答案
您应该选择 a
元素而不是 li
元素。其 span
元素具有 class
属性。我建议您使用谓词:
var links = doc.DocumentNode.SelectNodes("//li/span[@class='name']/a");
此 xpath 选择所有 class
属性等于 name
的 span
元素,然后选择 a
元素.
foreach (var a in links)
Console.WriteLine(a.InnerText);
您的示例 HTML 输出是:
Joe, Bloggs
Joe, Bloggs
Joe, Bloggs
旁注 - 您可以使用 HttpUtility.HtmlDecode(a.InnerText)
来获取解码后的文本(不仅
将被替换)。
更新:解析玩家
var players = from p in doc.DocumentNode.SelectNodes("//li")
let name = p.SelectSingleNode("span[@class='name']/a")
let country = p.SelectSingleNode("span[@class='country']")
select new
{
Name = (name == null) ? null :
HttpUtility.HtmlDecode(name.InnerText.Trim()),
Country = (country == null) ? null :
HttpUtility.HtmlDecode(country.InnerText.Trim())
};
结果:
[
{
Name: "Joe, Bloggs",
Country: "United States"
},
{
Name: "Joe, Bloggs",
Country: "United States"
},
{
Name: "Joe, Bloggs",
Country: "South Africa"
}
]
关于c# - 从具有特定类名的元素中选择值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22143514/