使用 Microsoft Translator API (SOAP) 和 C# 翻译网页。我想翻译我的网站,但使用 Translator Widget 对我不利,因为我还需要谷歌来抓取我的翻译页面。所以我需要先翻译它,然后再将其发送到浏览器。
到目前为止,还没有 API(我试着找到它,但找不到,如果你碰巧知道一个,请提及)你可以只传递一个 url,它会像这样向你发送翻译后的响应:http://www.microsofttranslator.com/bv.aspx?from=&to=nl&a=http%3A%2F%2Fwww.imdb.com%2F
这些是我到目前为止所做的尝试: 1. 从 Url 下载字符串,传递给 Client.Translate(..)。
The formatter threw an exception while trying to deserialize the message: Error in deserializing body of request message for operation 'Translate'. The maximum string content length quota (30720) has been exceeded while reading XML data. This quota may be increased by changing the MaxStringContentLength property on the XmlDictionaryReaderQuotas object used when creating the XML reader. Line 516, position 48.
2.
private static void processDocument(HtmlAgilityPack.HtmlDocument html, LanguageServiceClient Client)
{
HtmlNodeCollection coll = html.DocumentNode.SelectNodes("//text()[normalize-space(.) != '']");
foreach (HtmlNode node in coll)
{
if (node.InnerText == node.InnerHtml)
{
//node.InnerHtml = translateText(node.InnerText);
node.InnerHtml = Client.Translate("", node.InnerText, "en", "fr", "text/html", "general");
}
}
}
这个花了太多时间。最后,我收到了 Bad request (400) 异常。
解决这个问题的最佳方法是什么?我还打算保存文档,这样我就不用每次都翻译了。
最佳答案
此 C# 示例从本地文件转换 HTML:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using HtmlAgilityPack;
namespace TranslationAssistant.Business
{
class HTMLTranslationManager
{
public static int DoTranslation(string htmlfilename, string fromlanguage, string tolanguage)
{
string htmldocument = File.ReadAllText(htmlfilename);
string htmlout = string.Empty;
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(htmldocument);
htmlDoc.DocumentNode.SetAttributeValue("lang", TranslationServices.Core.TranslationServiceFacade.LanguageNameToLanguageCode(tolanguage));
var title = htmlDoc.DocumentNode.SelectSingleNode("//head//title");
if (title != null) title.InnerHtml = TranslationServices.Core.TranslationServiceFacade.TranslateString(title.InnerHtml, fromlanguage, tolanguage, "text/html");
var body = htmlDoc.DocumentNode.SelectSingleNode("//body");
if (body != null)
{
if (body.InnerHtml.Length < 10000)
{
body.InnerHtml = TranslationServices.Core.TranslationServiceFacade.TranslateString(body.InnerHtml, fromlanguage, tolanguage, "text/html");
}
else
{
List<HtmlNode> nodes = new List<HtmlNode>();
AddNodes(body.FirstChild, ref nodes);
Parallel.ForEach(nodes, (node) =>
{
if (node.InnerHtml.Length > 10000)
{
throw new Exception("Child node with a length of more than 10000 characters encountered.");
}
node.InnerHtml = TranslationServices.Core.TranslationServiceFacade.TranslateString(node.InnerHtml, fromlanguage, tolanguage, "text/html");
});
}
}
htmlDoc.Save(htmlfilename, Encoding.UTF8);
return 1;
}
/// <summary>
/// Add nodes of size smaller than 10000 characters to the list, and recurse into the bigger ones.
/// </summary>
/// <param name="rootnode">The node to start from</param>
/// <param name="nodes">Reference to the node list</param>
private static void AddNodes(HtmlNode rootnode, ref List<HtmlNode> nodes)
{
string[] DNTList = { "script", "#text", "code", "col", "colgroup", "embed", "em", "#comment", "image", "map", "media", "meta", "source", "xml"}; //DNT - Do Not Translate - these nodes are skipped.
HtmlNode child = rootnode;
while (child != rootnode.LastChild)
{
if (!DNTList.Contains(child.Name.ToLowerInvariant())) {
if (child.InnerHtml.Length > 10000)
{
AddNodes(child.FirstChild, ref nodes);
}
else
{
if (child.InnerHtml.Trim().Length != 0) nodes.Add(child);
}
}
child = child.NextSibling;
}
}
}
}
这是 http://github.com/microsofttranslator/documenttranslator 中的 HTMLTranslationManager.cs ,它使用 TranslationServiceFacade.cs 中的辅助函数 TranslateString()。您可以简化并直接在此处插入翻译服务调用来代替 TranslateString()。
关于c# - 使用 Microsoft Translator API 用 C# 翻译整个网页,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33541787/