c# - HtmlAgilityPack 中的 NullReferenceException

标签 c# html-agility-pack

我正在尝试使用 xpath 从下面提到的 url 中提取一个 link

string url = "http://www.album-cover-art.org/search.php?q=Ruin+-+Live+Album+Version+Lamb+of+God"

我的代码:

HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc = web.Load(url); //Exception generated here Line 23

if (htmlDoc.DocumentNode != null)
{
  HtmlNode linkNode = htmlDoc.DocumentNode.SelectSingleNode(".//*[@id='related_search_row']/img/@src");
  if (linkNode != null)
        Console.WriteLine(linkNode.InnerText);
}

上面的代码编译正常,但是当我尝试运行它时会产生异常

Unhandled Exception: System.NullReferenceException: Object reference not set to an instance of an object.

完整的堆栈跟踪

System.NullReferenceException: Object reference not set to an instance of an object.
   at HtmlAgilityPack.HtmlDocument.ReadDocumentEncoding(HtmlNode node) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 1916
   at HtmlAgilityPack.HtmlDocument.PushNodeEnd(Int32 index, Boolean close) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 1805
   at HtmlAgilityPack.HtmlDocument.Parse() in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 1468
   at HtmlAgilityPack.HtmlDocument.Load(TextReader reader) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 769
   at HtmlAgilityPack.HtmlWeb.Get(Uri uri, String method, String path, HtmlDocument doc, IWebProxy proxy, ICredentials creds) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1515
   at HtmlAgilityPack.HtmlWeb.LoadUrl(Uri uri, String method, WebProxy proxy, NetworkCredential creds) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1563
   at HtmlAgilityPack.HtmlWeb.Load(String url, String method) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1149
   at HtmlAgilityPack.HtmlWeb.Load(String url) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1107
   at ScreenScrapping.Program.Main(String[] args) in c:\Users\ranveer\csharp\ScreenScrapping\ScreenScrapping\Program.cs:line 23

所以,我的问题是为什么会出现此异常。

最佳答案

这是 HtmlAgilityPack 中的错误。您尝试解析的文档有 <meta http-equiv="Content-Type" content="text/html; charset=iso-utf-8"> charset在哪里AgilityPack 无法将值 ( iso-utf-8) 解析为有效的编码名称。作为西蒙·穆里埃 said这是 1.4.0.0 中引入的错误

为避免这种情况,请从流中手动加载文档并像这样手动设置编码:

var htmlDoc = new HtmlDocument();
htmlDoc.OptionReadEncoding = false;
var request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
using (var response = (HttpWebResponse)request.GetResponse())
{
    using (var stream = response.GetResponseStream())
    {
        htmlDoc.Load(stream, Encoding.UTF8);
    }
}

关于c# - HtmlAgilityPack 中的 NullReferenceException,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10151993/

相关文章:

html - 使用 html 敏捷包修复丢失的 HTML 标签

xpath - 使用 HTML Agility Pack 解析图像 src

C#,html-agility-pack 获取不在标签内的文本

c# - HtmlAgilityPack 文档

c# - 如何使用 HttpListener 同时处理多个连接?

c# - 如何搜索带引号的字符串?

c# - 一般如何在 MapRoute 方法中实现 URL 重写?

c# - Azure 从连接的设备/DPS 服务获取 IoTHub 连接字符串

c# - HtmlAgilityPack - 删除节点后保留文本位置

c# - 如何显示使用 Emgu 捕获的网络摄像头图像?