我正在为特定站点编写网络爬虫。该应用程序是一个不使用多线程的 VB.Net Windows 窗体应用程序 - 每个 Web 请求都是连续的。然而,在十次成功的页面检索之后,每个连续的请求都会超时。
我已经查看了已经在 SO 上发布的类似问题,并将推荐的技术实现到我的 GetPage 例程中,如下所示:
Public Function GetPage(ByVal url As String) As String
Dim result As String = String.Empty
Dim uri As New Uri(url)
Dim sp As ServicePoint = ServicePointManager.FindServicePoint(uri)
sp.ConnectionLimit = 100
Dim request As HttpWebRequest = WebRequest.Create(uri)
request.KeepAlive = False
request.Timeout = 15000
Try
Using response As HttpWebResponse = DirectCast(request.GetResponse, HttpWebResponse)
Using dataStream As Stream = response.GetResponseStream()
Using reader As New StreamReader(dataStream)
If response.StatusCode <> HttpStatusCode.OK Then
Throw New Exception("Got response status code: " + response.StatusCode)
End If
result = reader.ReadToEnd()
End Using
End Using
response.Close()
End Using
Catch ex As Exception
Dim msg As String = "Error reading page """ & url & """. " & ex.Message
Logger.LogMessage(msg, LogOutputLevel.Diagnostics)
End Try
Return result
End Function
我错过了什么吗?我是否没有关闭或处理应该关闭的对象?它总是在连续十次请求后发生,这似乎很奇怪。
笔记:
ServicePointManager.DefaultConnectionLimit = 100
编辑
我在每个 Web 请求之间添加了 2 到 7 秒的延迟,这样我就不会“攻击”站点或尝试进行 DOS 攻击。但是,问题仍然存在。
最佳答案
我今天遇到了这个问题,我的解决方案是确保响应始终关闭。
我认为您需要在将异常放入 using 之前放入 response.Close() 。
Using response As HttpWebResponse = DirectCast(request.GetResponse, HttpWebResponse)
Using dataStream As Stream = response.GetResponseStream()
Using reader As New StreamReader(dataStream)
If response.StatusCode <> HttpStatusCode.OK Then
response.Close()
Throw New Exception("Got response status code: " + response.StatusCode)
End If
result = reader.ReadToEnd()
End Using
End Using
response.Close()
End Using
关于.net - 10 次连续请求后 HttpWebRequest 超时,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1191926/