html - 使用 VBA 从谷歌翻译中提取 div 的内容

标签 html regex vba excel

我有翻译语言的功能:

Public Function Translate(rng As Range, Optional translateFrom As String = "nl", Optional translateTo As String = "en")
    Dim getParam As String, Trans As String, objHTTP As Object, URL As String
    Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP")
    getParam = ConvertToGet(rng.Value)
    URL = "https://translate.google.com/m?hl=" & translateFrom & "&sl=" & translateFrom & "&tl=" & translateTo & "&ie=UTF-8&prev=_m&q=" & getParam
    objHTTP.Open "GET", URL, False
    objHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
    objHTTP.send ("")
    If InStr(objHTTP.responseText, "div dir=""ltr""") > 0 Then
        Trans = RegexExecute(objHTTP.responseText, "div[^""]*?""ltr"".*?>(.+?)</div>")
        Translate = CleanA(Trans)
    Else
        Translate = CVErr(xlErrValue)
    End If
End Function

中文翻译:

A1 = Hello

B1 = Translate(A1,"en","zh-cn")

结果是“Nǐ hǎo”,正确结果是“你好”

谷歌链接:https://translate.google.com/m?hl=en&sl=en&tl=zh-CN&ie=UTF-8&prev=_m&q=hello

我想要结果:

B1 = 你好

C1 = Nǐ hǎo

我想我需要修复这段代码:

Trans = RegexExecute(objHTTP.responseText, "div[^""]*?""ltr"".*?>(.+?)</div>")

请帮帮我,谢谢!

最佳答案

页面返回两个<div>就像:

<div dir="ltr" class="o1">Nǐ hǎo</div>
<div dir="ltr" class="t0">你好</div>

最好不要尝试使用正则表达式解析 HTML,因为解析 HTML 很容易遇到困难 - 您可以使用 VBA 中的 Microsoft HTML 对象库来获得类似的结果。

获取那两个<div>的内容您可以根据以下示例使用此代码的标签:

' o1 has Anglicised translation, t0 as tranlsation in target language
Set objDivs = objHTML.getElementsByTagName("div")
For Each objDiv In objDivs
    If objDiv.className = "o1" Then
        strTranslatedO1 = objDiv.innerText
    End If
    If objDiv.className = "t0" Then
        strTranslatedT0 = objDiv.innerText
    End If
Next objDiv

这基本上是遍历所有 <div>返回的 HTML 中的标签并检查类名 o1t0然后得到 innerText属性(property)。使用这种技术,您可以获得翻译后的值并将它们写回工作表,例如:

enter image description here

完整代码:

Option Explicit

Public Sub Test()

    Dim ws As Worksheet
    
    ' testing worksheet
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    ws.Cells.Delete
    
    ' test inputs
    ws.Range("A1:E1") = Array("Input", "From", "To", "T0", "O1")
    ws.Range("A2:A4") = "hello"
    ws.Range("B2:B4") = "English"
    ws.Range("C2:C4") = Application.Transpose(Array("Chinese", "Spanish", "Russian"))
    
    ' test
    ws.Range("D2") = Translate("hello", "en", "zh-cn", True)
    ws.Range("E2") = Translate("hello", "en", "zh-cn", False)
    ws.Range("D3") = Translate("hello", "en", "es", True)
    ws.Range("E3") = Translate("hello", "en", "es", False) 'Spanish uses latin alphabet
    ws.Range("D4") = Translate("hello", "en", "ru", True)
    ws.Range("E4") = Translate("hello", "en", "ru", False)

End Sub

Public Function Translate(strInput As String, strFromLanguageCode As String, strToLanguageCode As String, blnTargetAlphabet As Boolean) As String
    
    Dim strURL As String
    Dim objHTTP As Object
    Dim objHTML As Object
    Dim objDivs As Object, objDiv
    Dim strTranslatedT0 As String
    Dim strTranslatedO1 As String
   
    ' send query to web page
    strURL = "https://translate.google.com/m?hl=" & strFromLanguageCode & _
        "&sl=" & strFromLanguageCode & _
        "&tl=" & strToLanguageCode & _
        "&ie=UTF-8&prev=_m&q=" & strInput

    Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP")
    objHTTP.Open "GET", strURL, False
    objHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
    objHTTP.send ""
    
    ' create a html document
    Set objHTML = CreateObject("htmlfile")
    With objHTML
        .Open
        .Write objHTTP.responseText
        .Close
    End With
    
    ' o1 has Anglicised translation, t0 as tranlsation in target language
    Set objDivs = objHTML.getElementsByTagName("div")
    For Each objDiv In objDivs
        If objDiv.className = "o1" Then
            strTranslatedO1 = objDiv.innerText
        End If
        If objDiv.className = "t0" Then
            strTranslatedT0 = objDiv.innerText
        End If
    Next objDiv
    
    ' choose which to return
    If blnTargetAlphabet Then
        Translate = strTranslatedT0
    Else
        Translate = strTranslatedO1
    End If
    
CleanUp:
    Set objHTML = Nothing
    Set objHTTP = Nothing
    
End Function

2020 年 12 月更新

看起来这种方法可能在 11 月中旬就不再有效了。

查看响应

  • div类名已更改为更晦涩的名称
  • 有一些深奥的c-wiz元素做一些美妙的事情...
  • 另外,我怀疑某些客户端脚本在检索文档后调用实际翻译

选项:Selenium、Microsoft Translate、Google 翻译 API 的免费和付费层级;)

关于html - 使用 VBA 从谷歌翻译中提取 div 的内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47248969/

相关文章:

jquery - 使用 Jquery 使用默认值填充字段

html - CSS中灵活大小的垂直文本

Notepad++ 中的正则表达式 "Replace Text + Value"==> "New Text + Value"

Swift - 删除重复的标点符号

vba - vb 脚本到 javascript

jquery - jPlayer - 简化多个实例的功能

php - 如何用一个 <br/> 标签正则表达式替换多个 <br/> 标签?

excel - 在 Excel 中捕获工作表取消保护事件

vba - 当用户从下拉组合框(ActiveX)中选择值时会触发什么事件?

html - 使用 Bootstrap 响应方案的框之间的间距