c# - HtmlAgilityPack 可以处理一个 xsl 文件自带的 xml 文件来渲染 html 吗?

标签 c# html-agility-pack

我想知道 HtmlAgilityPack 读取包含 xsl 文件以呈现 html 的 xml 文件的最佳方法。 HtmlDocument 类上是否有任何设置可以帮助实现这一点,或者我是否必须找到一种方法来执行转换,然后再使用 HtmlAgiliyPack 加载它?如果后者是肯定的,有人知道进行这种转换的好的库或方法吗?下面是一个返回带有 xls 文件和我想使用的代码的 xml 的网站示例。

var uri = new Uri("http://www.skechers.com/");
var request = (HttpWebRequest)WebRequest.Create(url);
var cookieContainer = new CookieContainer();

request.CookieContainer = cookieContainer;
request.UserAgent = @"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5";
request.Method = "GET";
request.AllowAutoRedirect = true;
request.Timeout = 15000;

var response = (HttpWebResponse)request.GetResponse();
var page = new HtmlDocument();
page.OptionReadEncoding = false;
var stream = response.GetResponseStream();
page.Load(stream); 

此代码不会抛出任何错误,但解析的是 xml 而不是转换,这正是我想要的。

最佳答案

Html Agility Pack 可以在两点上帮助您:

1) 比较容易得到Xml处理指令,因为它把PI数据解析成Html,所以会转成属性

2) HtmlDocument 实现了 IXPathNavigable,因此它可以直接由 .NET Xslt 转换引擎进行转换。

这是一段有效的代码。我必须添加一个特定的 XmlResover 来正确处理 Xslt 转换,但我认为这是这个 skechers 案例所特有的。

public static void DownloadAndProcessXml(string url, string userAgent, string outputFilePath)
{
    using (XmlTextWriter writer = new XmlTextWriter(outputFilePath, Encoding.UTF8))
    {
        DownloadAndProcessXml(url, userAgent, writer);
    }
}

public static void DownloadAndProcessXml(string url, string userAgent, XmlWriter output)
{
    UserAgentXmlUrlResolver resolver = new UserAgentXmlUrlResolver(url, userAgent);

    // WebClient is an easy to use class.
    using (WebClient client = new WebClient())
    {
        // download Xml doc. set User-Agent header or the site won't answer us...
        client.Headers[HttpRequestHeader.UserAgent] = resolver.UserAgent;
        HtmlDocument xmlDoc = new HtmlDocument();
        xmlDoc.Load(client.OpenRead(url));

        // determine xslt (note the xpath trick as Html Agility Pack does not support xml processing instructions)
        string xsltUrl = xmlDoc.DocumentNode.SelectSingleNode("//*[name()='?xml-stylesheet']").GetAttributeValue("href", null);

        // download Xslt doc
        client.Headers[HttpRequestHeader.UserAgent] = resolver.UserAgent;
        XslCompiledTransform xslt = new XslCompiledTransform();
        xslt.Load(new XmlTextReader(client.OpenRead(url + xsltUrl)), new XsltSettings(true, false), null);

        // transform Html/Xml doc into new Xml doc, easy as HtmlDocument implements IXPathNavigable
        // note the use of a custom resolver to overcome this Xslt resolve requests
        xslt.Transform(xmlDoc, null, output, resolver);
    }
}

// This class is needed during transformation otherwise there are errors.
// This is probably due to this very specific Xslt file that needs to go back to the root document itself.
public class UserAgentXmlUrlResolver : XmlUrlResolver
{
    public UserAgentXmlUrlResolver(string rootUrl, string userAgent)
    {
        RootUrl = rootUrl;
        UserAgent = userAgent;
    }

    public string RootUrl { get; set; }
    public string UserAgent { get; set; }

    public override object GetEntity(Uri absoluteUri, string role, Type ofObjectToReturn)
    {
        WebClient client = new WebClient();
        if (!string.IsNullOrEmpty(UserAgent))
        {
            client.Headers[HttpRequestHeader.UserAgent] = UserAgent;
        }
        return client.OpenRead(absoluteUri);
    }

    public override Uri ResolveUri(Uri baseUri, string relativeUri)
    {
        if ((relativeUri == "/") && (!string.IsNullOrEmpty(RootUrl)))
            return new Uri(RootUrl);

        return base.ResolveUri(baseUri, relativeUri);
    }
}

你这样调用它:

    string url = "http://www.skechers.com/";
    string ua = @"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5";
    DownloadAndProcessXml(url, ua, "skechers.html");

关于c# - HtmlAgilityPack 可以处理一个 xsl 文件自带的 xml 文件来渲染 html 吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5385010/

相关文章:

c# - HtmlAgilityPack - 如何在使用 selectnodes 时首先选择一个标签 href

c# - 集群环境中的 WCF 双工通信

c# - HttpContext.SignInAsync 与 SigninManger.SignInAsync

c# - System.Data.SQLite parseViaFramework

C# 和 HTML 敏捷包 - 返回空值

c# - 解析 HTML 获取键和值

c# - try catch 在 WinForms 应用程序中无法正常工作

c# - 从windows电脑执行命令ls到linux电脑

c# - 无法创建 ssl/tls 安全通道。 HTML 敏捷性