html - Excel VBA 抓取 - HTML 表格不可见

标签 html excel vba web-scraping

我正在尝试使用 excel vba 网站抓取从“https://in.tradingview.com/symbols/NSE-ABB/technicals/”获取数据,尽管我收到了响应,但 body.innerHTML 没有显示必需的表,但在 Chrome 中,如果我检查页面,我可以看到带有该名称的表。

代码有什么问题?

With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", URL, False
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
    End With
    
    sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
    WriteTxtFile sResponse
    With html
        .body.innerHTML = sResponse
        Set tElementC = .getElementsByClassName("table-1i1M26QY- maTable-27Z4Dq6Y- tableWithAction-2OCRQQ8y-")(0).getElementsByTagName("td")
        
    End With

网址 --> https://in.tradingview.com/symbols/NSE-ABB/technicals/ 要访问的类名 =“table-1i1M26QY- maTable-27Z4Dq6Y- tableWithAction-2OCRQQ8y-”

最佳答案

提供的链接的网页源 HTML https://in.tradingview.com/symbols/NSE-ABB/technicals/不包含必要的数据,它使用 AJAX。该网站有一个可用的 API。响应以 JSON 格式返回。因此,您需要首先进行一些逆向工程工作,以了解该网站是如何工作的。在浏览器中,e。 G。 Chrome,按F12打开DevTools,导航到网页,转到Network选项卡,将过滤器设置为XHR,如下所示:

network tab

检查记录的响应。其中最大的一个实际上包含了所有必要的数据:

json response

要制作这样的 XHR,您还需要保留整个有效负载结构,并添加相关 header :

headers and form data

在表单数据部分,数组中有很多报价字段标题,因此您可以选择您实际需要的。您可能会找到更多可用的标题,单击“启动器”链接(上面的第一个屏幕截图),您将看到启动该 XHR 的 JS 代码。单击底部的 Pretty print {} 以使代码可读。在搜索框中输入您已从表单数据中提取的任何标题,例如。 G。 Recommend.Other,并在代码中找到它旁边的其他内容:

quote field titles

这里是 VBA 示例,展示了如何完成此类抓取。 导入JSON.bas模块到 VBA 项目中进行 JSON 处理。

Option Explicit

Sub Test()

    Dim aQuoteFieldTitles()
    Dim aQuoteFieldData()
    Dim sPayload As String
    Dim sJSONString As String
    Dim vJSON
    Dim sState As String
    Dim i As Long

    ' Put the necessary field titles into array
    aQuoteFieldTitles = Array( _
        "name", "description", "country", "type", "after_tax_margin", "average_volume", "average_volume_30d_calc", "average_volume_60d_calc", "average_volume_90d_calc", "basic_eps_net_income", "beta_1_year", "beta_3_year", "beta_5_year", "current_ratio", "debt_to_assets", "debt_to_equity", "dividends_paid", "dividends_per_share_fq", _
        "dividends_yield", "dps_common_stock_prim_issue_fy", "earnings_per_share_basic_ttm", "earnings_per_share_diluted_ttm", "earnings_per_share_forecast_next_fq", "earnings_per_share_fq", "earnings_release_date", "earnings_release_next_date", "ebitda", "enterprise_value_ebitda_ttm", "enterprise_value_fq", "exchange", "expected_annual_dividends", _
        "gross_margin", "gross_profit", "gross_profit_fq", "industry", "last_annual_eps", "last_annual_revenue", "long_term_capital", "market_cap_basic", "market_cap_calc", "net_debt", "net_income", "number_of_employees", "number_of_shareholders", "operating_margin", _
        "pre_tax_margin", "preferred_dividends", "price_52_week_high", "price_52_week_low", "price_book_ratio", "price_earnings_ttm", "price_revenue_ttm", "price_sales_ratio", "quick_ratio", "return_of_invested_capital_percent_ttm", "return_on_assets", "return_on_equity", "return_on_invested_capital", "revenue_per_employee", "sector", _
        "eps_surprise_fq", "eps_surprise_percent_fq", "total_assets", "total_capital", "total_current_assets", "total_debt", "total_revenue", "total_shares_outstanding_fundamental", "volume", "relative_volume", "pre_change", "post_change", "close", "open", "high", "low", "gap", "price_earnings_to_growth_ttm", "price_sales", "price_book_fq", _
        "price_free_cash_flow_ttm", "float_shares_outstanding", "total_shares_outstanding", "change_from_open", "change_from_open_abs", "Perf.W", "Perf.1M", "Perf.3M", "Perf.6M", "Perf.Y", "Perf.YTD", "Volatility.W", "Volatility.M", "Volatility.D", "RSI", "RSI7", "ADX", "ADX+DI", "ADX-DI", "ATR", "Mom", "High.All", "Low.All", "High.6M", "Low.6M", _
        "High.3M", "Low.3M", "High.1M", "Low.1M", "EMA5", "EMA10", "EMA20", "EMA30", "EMA50", "EMA100", "EMA200", "SMA5", "SMA10", "SMA20", "SMA30", "SMA50", "SMA100", "SMA200", "Stoch.K", "Stoch.D", "MACD.macd", "MACD.signal", "Aroon.Up", "Aroon.Down", "BB.upper", "BB.lower", "goodwill", "debt_to_equity_fq", "CCI20", "DonchCh20.Upper", _
        "DonchCh20.Lower", "HullMA9", "AO", "Pivot.M.Classic.S3", "Pivot.M.Classic.S2", "Pivot.M.Classic.S1", "Pivot.M.Classic.Middle", "Pivot.M.Classic.R1", "Pivot.M.Classic.R2", "Pivot.M.Classic.R3", "Pivot.M.Fibonacci.S3", "Pivot.M.Fibonacci.S2", "Pivot.M.Fibonacci.S1", "Pivot.M.Fibonacci.Middle", "Pivot.M.Fibonacci.R1", _
        "Pivot.M.Fibonacci.R2", "Pivot.M.Fibonacci.R3", "Pivot.M.Camarilla.S3", "Pivot.M.Camarilla.S2", "Pivot.M.Camarilla.S1", "Pivot.M.Camarilla.Middle", "Pivot.M.Camarilla.R1", "Pivot.M.Camarilla.R2", "Pivot.M.Camarilla.R3", "Pivot.M.Woodie.S3", "Pivot.M.Woodie.S2", "Pivot.M.Woodie.S1", "Pivot.M.Woodie.Middle", "Pivot.M.Woodie.R1", _
        "Pivot.M.Woodie.R2", "Pivot.M.Woodie.R3", "Pivot.M.Demark.S1", "Pivot.M.Demark.Middle", "Pivot.M.Demark.R1", "KltChnl.upper", "KltChnl.lower", "P.SAR", "Value.Traded", "MoneyFlow", "ChaikinMoneyFlow", "Recommend.All", "Recommend.MA", "Recommend.Other", "Stoch.RSI.K", "Stoch.RSI.D", "W.R", "ROC", "BBPower", "UO", "Ichimoku.CLine", _
        "Ichimoku.BLine", "Ichimoku.Lead1", "Ichimoku.Lead2", "VWMA", "ADR", "RSI[1]", "Stoch.K[1]", "Stoch.D[1]", "CCI20[1]", "ADX-DI[1]", "AO[1]", "Mom[1]", "Rec.Stoch.RSI", "Rec.WR", "Rec.BBPower", "Rec.UO", "Rec.Ichimoku", "Rec.VWMA", "Rec.HullMA9" _
    )

    ' Field titles exactly as in the table MOVING AVERAGES
    ' aQuoteFieldTitles = Array("EMA5", "SMA5", "EMA10", "SMA10", "EMA20", "SMA20", "EMA30", "SMA30", "EMA50", "SMA50", "EMA100", "SMA100", "EMA200", "SMA200", "Ichimoku.BLine", "VWMA", "HullMA9")

    ' Compose payload
    sPayload = "{""symbols"":{""tickers"":[""NSE:ABB""],""query"":{""types"":[]}},""columns"":" & JSON.Serialize(aQuoteFieldTitles) & "}"
    ' Retrieve JSON response
    With CreateObject("MSXML2.XMLHTTP")
        .Open "POST", "https://scanner.tradingview.com/india/scan", True
        .setRequestHeader "content-type", "application/x-www-form-urlencoded"
        .setRequestHeader "user-agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36"
        .setRequestHeader "content-length", Len(sPayload)
        .send (sPayload)
        Do Until .readyState = 4: DoEvents: Loop
        sJSONString = .responseText
    End With
    ' Parse JSON response
    JSON.Parse sJSONString, vJSON, sState
    ' Check response validity
    Select Case True
        Case sState <> "Object"
            MsgBox "Invalid JSON response"
        Case IsNull(vJSON("data"))
            MsgBox vJSON("error")
        Case Else
            ' Output data to worksheet #1
            aQuoteFieldData = vJSON("data")(0)("d")
            With ThisWorkbook.Sheets(1)
                .Cells.Delete
                .Cells.WrapText = False
                For i = 0 To UBound(aQuoteFieldTitles)
                    .Cells(i + 1, 1).Value = aQuoteFieldTitles(i)
                    .Cells(i + 1, 2).Value = aQuoteFieldData(i)
                Next
                .Columns.AutoFit
            End With
            MsgBox "Completed"
    End Select

End Sub

我的输出如下:

output

顺便说一句,应用了类似的方法 in other answers .

关于html - Excel VBA 抓取 - HTML 表格不可见,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54497257/

相关文章:

excel - 在VBA公式中使用IFERROR

Excel 2007 条件格式有 2 个条件,一个基于另一列

excel - 按需编译已禁用

php - 使用 PHP 解析 HTML 以从 ids 和 classes 属性返回 CSS 规则

html - 关于如何使搜索栏响应的建议

javascript - 上一张和下一张幻灯片的 Flexslider 位置

excel - 与 DAX 的左外连接总和

html - 将 div 居中对齐

c# - 将 EPPlus 与 MemoryStream 一起使用

excel - Databricks:转换数据框并导出为 xls/xlsx