c# - 如何使用 C# 下载具有正确编码的 HTML 页面？

我需要一个获取页面链接并返回该页面标题的方法。

我使用了 WebClient -

        var webClient = new WebClient();
        var htmlString = webClient.DownloadString(_link);

它运行良好，但在外语编码方面失败了。我得到的是问号和奇怪的字符，而不是我需要的文本。

是否有一种通用的方法来识别页面的编码并使用它？如果不是全部，我需要它来支持大多数编码。

最佳答案

使用 HtmlAgilityPack你可以这样做

using (WebClient client = new WebClient())
using (var read = client.OpenRead("http://your.com"))
{
    HtmlDocument doc = new HtmlDocument();
    doc.Load(read, true); // true = get encoding from byte order masks
    // process doc, extract title
    var title = doc.DocumentNode.SelectSingleNode("//title").InnerText;
}

关于c# - 如何使用 C# 下载具有正确编码的 HTML 页面？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/5102997/

上一篇：c# - 在虚拟机上获取 CPU ID

下一篇：c# - 根据一对字符串生成唯一的字符串

python - 打印功能在继续之前不会打印完整列表

c# - AADSTS90102 : 'redirect_uri' value must be a valid absolute Uri

c# - 链式哈希表和理解 Deflate

c# - 如何使用 IEnumerable 实现进度？

PowerShell 输出文件 : prevent encoding changes

python - BeautifulSoup find_all UnicodeEncodeError

python - 编程从 Bloomberg 终端获取数据

html - 从网页中提取背景图像/解析 HTML+CSS

c# - 在实现接口(interface)以满足 StyleCop 时如何自动化属性/方法 header