vb.net - 防止VB.Net中的HTMLAgilitypack错误

我正在使用HTMLAgilityPack解析HTML页面。但是，有时我会尝试解析错误的数据(在这种情况下为图像)，由于明显的原因，ofc失败了。

Private Sub parseHtml(ByVal content As String, ByVal url As String)
    Try
        Dim contentHash As String = hashGenerator.ComputeHash(content, "SHA1")
        Dim doc As HtmlDocument = New HtmlDocument()

        doc.Load(New StringReader(content))

        Dim root As HtmlNode = doc.DocumentNode
        Dim anchorTags As New List(Of String)

        For Each link As HtmlNode In root.SelectNodes("//a")
            cururl = link.OuterHtml
            If link.Attributes("href") Is Nothing Then Continue For
            If Uri.IsWellFormedUriString(link.Attributes("href").Value, UriKind.Absolute) Then
                urlQueue.Enqueue(link.Attributes("href").Value)
            Else
                Dim myUri As New Uri(url)
                urlQueue.Enqueue(myUri.Scheme & "://" & myUri.Host & link.Attributes("href").Value)
            End If
        Next
    Catch ex As Exception
        MsgBox(ex.Message, MsgBoxStyle.Critical, "Error (parseHtml(" & url & "))")
    End Try
End Sub

我得到的错误是:

A first chance exception of type 'System.NullReferenceException' occurred in Webcrawler.exe Object reference not set to an instance of an object.

关于内容，我尝试解析:

��Iޥ�+�: 8�0�x�

在尝试解析内容以防止错误之前，如何检查内容是否“可解析”？

现在，它是一个使错误 pop 窗口的图像，但是我认为它可能是任何不是(x)html的东西。

在此先感谢伟大的社区:)

最佳答案

在尝试解析返回的数据之前，需要检查返回的content-type header 。

对于HTML页面，它应该是text/html，对于XHTML来说应该是application/xhtml+xml。

关于vb.net - 防止VB.Net中的HTMLAgilitypack错误，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/4534055/

vb.net - 防止VB.Net中的HTMLAgilitypack错误

上一篇：php - 解析错误…根本找不到？ (第86行)

下一篇：mysql - 此SQL语句有什么问题-语法错误＃1064？