c# - 使用 Microsoft Translator API 用 C# 翻译整个网页

标签 c# asp.net microsoft-translator

使用 Microsoft Translator API (SOAP) 和 C# 翻译网页。我想翻译我的网站,但使用 Translator Widget 对我不利,因为我还需要谷歌来抓取我的翻译页面。所以我需要先翻译它,然后再将其发送到浏览器。

到目前为止,还没有 API(我试着找到它,但找不到,如果你碰巧知道一个,请提及)你可以只传递一个 url,它会像这样向你发送翻译后的响应:http://www.microsofttranslator.com/bv.aspx?from=&to=nl&a=http%3A%2F%2Fwww.imdb.com%2F

这些是我到目前为止所做的尝试: 1. 从 Url 下载字符串,传递给 Client.Translate(..)。

The formatter threw an exception while trying to deserialize the message: Error in deserializing body of request message for operation 'Translate'. The maximum string content length quota (30720) has been exceeded while reading XML data. This quota may be increased by changing the MaxStringContentLength property on the XmlDictionaryReaderQuotas object used when creating the XML reader. Line 516, position 48.

2.

private static void processDocument(HtmlAgilityPack.HtmlDocument html, LanguageServiceClient Client)
        {
            HtmlNodeCollection coll = html.DocumentNode.SelectNodes("//text()[normalize-space(.) != '']");
            foreach (HtmlNode node in coll)
            {
                if (node.InnerText == node.InnerHtml)
                {
                    //node.InnerHtml = translateText(node.InnerText);
                    node.InnerHtml = Client.Translate("", node.InnerText, "en", "fr", "text/html", "general");
                }
            }

        }

这个花了太多时间。最后,我收到了 Bad request (400) 异常。

解决这个问题的最佳方法是什么?我还打算保存文档,这样我就不用每次都翻译了。

最佳答案

此 C# 示例从本地文件转换 HTML:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using HtmlAgilityPack;

namespace TranslationAssistant.Business
{
class HTMLTranslationManager
{
    public static int DoTranslation(string htmlfilename, string fromlanguage, string tolanguage)
    {
        string htmldocument = File.ReadAllText(htmlfilename);
        string htmlout = string.Empty;

        HtmlDocument htmlDoc = new HtmlDocument();
        htmlDoc.LoadHtml(htmldocument);
        htmlDoc.DocumentNode.SetAttributeValue("lang", TranslationServices.Core.TranslationServiceFacade.LanguageNameToLanguageCode(tolanguage));
        var title = htmlDoc.DocumentNode.SelectSingleNode("//head//title");
        if (title != null) title.InnerHtml = TranslationServices.Core.TranslationServiceFacade.TranslateString(title.InnerHtml, fromlanguage, tolanguage, "text/html");
        var body = htmlDoc.DocumentNode.SelectSingleNode("//body");
        if (body != null)
        {
            if (body.InnerHtml.Length < 10000)
            {
                body.InnerHtml = TranslationServices.Core.TranslationServiceFacade.TranslateString(body.InnerHtml, fromlanguage, tolanguage, "text/html");
            }
            else
            {
                List<HtmlNode> nodes = new List<HtmlNode>();
                AddNodes(body.FirstChild, ref nodes);

                Parallel.ForEach(nodes, (node) =>
                    {
                        if (node.InnerHtml.Length > 10000)
                        {
                            throw new Exception("Child node with a length of more than 10000 characters encountered.");
                        }
                        node.InnerHtml = TranslationServices.Core.TranslationServiceFacade.TranslateString(node.InnerHtml, fromlanguage, tolanguage, "text/html");
                    });
            }
        }
        htmlDoc.Save(htmlfilename, Encoding.UTF8);
        return 1;
    }

    /// <summary>
    /// Add nodes of size smaller than 10000 characters to the list, and recurse into the bigger ones.
    /// </summary>
    /// <param name="rootnode">The node to start from</param>
    /// <param name="nodes">Reference to the node list</param>
    private static void AddNodes(HtmlNode rootnode, ref List<HtmlNode> nodes)
    {
        string[] DNTList = { "script", "#text", "code", "col", "colgroup", "embed", "em", "#comment", "image", "map", "media", "meta", "source", "xml"};  //DNT - Do Not Translate - these nodes are skipped.
        HtmlNode child = rootnode;
        while (child != rootnode.LastChild)
        {
            if (!DNTList.Contains(child.Name.ToLowerInvariant())) {
                if (child.InnerHtml.Length > 10000)
                {
                    AddNodes(child.FirstChild, ref nodes);
                }
                else
                {
                    if (child.InnerHtml.Trim().Length != 0) nodes.Add(child);
                }
            }
            child = child.NextSibling;
        }
    }

}
}

这是 http://github.com/microsofttranslator/documenttranslator 中的 HTMLTranslationManager.cs ,它使用 TranslationServiceFacade.cs 中的辅助函数 TranslateString()。您可以简化并直接在此处插入翻译服务调用来代替 TranslateString()。

关于c# - 使用 Microsoft Translator API 用 C# 翻译整个网页,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33541787/

相关文章:

c# - 如何创建一个Windows App(C#),在其中可以在消息框中显示各种对象中的所有异常

c# - NewtonSoft JSON.NET 的升级不会隐式序列化 protected 成员

c# - 尝试按如下方式访问母版页代码中的 telerik RadEditor 控件...它给出了一些错误

c# - 在 C# 中处理 SQL 数据

c# - 扩展 DefaultDirectRouteProvider 的 ASP.NET Web API

asp.net - 从项目中的一个 ASP.NET 页面中删除主题

azure - 文本翻译 API 3.0 : Transliterate

azure - 文本翻译 API |微软Azure |总是错误 401000

ios - 我如何使用 websocket 将音频发送到 Microsoft Translator

c# - 根据关键字剪切音频文件