预先感谢您的帮助。我运行的是 Windows 8.1,拥有最新的 IE/Chrome 浏览器和最新的 Excel。我正在尝试编写一个从 StackOverflow ( https://stackoverflow.com/tags ) 提取数据的 Excel 宏。具体来说,我试图提取日期(运行宏的日期)、标签名称、标签数量以及标签的简要描述。我让它适用于表格的第一页,但不适用于其余页面(目前有 1132 页)。现在,每次运行宏时它都会覆盖数据,并且我不确定如何让它在运行之前查找下一个空单元格。最后,我试图让它每周自动运行一次。
我非常感谢这里的任何帮助。问题是:
- 从网络表格中提取首页以外的数据
- 使其将数据抓取到下一个空行而不是覆盖
- 使宏每周自动运行一次
代码(到目前为止)如下。谢谢!
Enum READYSTATE
READYSTATE_UNINITIALIZED = 0
READYSTATE_LOADING = 1
READYSTATE_LOADED = 2
READYSTATE_INTERACTIVE = 3
READYSTATE_COMPLETE = 4
End Enum
Sub ImportStackOverflowData()
'to refer to the running copy of Internet Explorer
Dim ie As InternetExplorer
'to refer to the HTML document returned
Dim html As HTMLDocument
'open Internet Explorer in memory, and go to website
Set ie = New InternetExplorer
ie.Visible = False
ie.navigate "http://stackoverflow.com/tags"
'Wait until IE is done loading page
Do While ie.READYSTATE <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to StackOverflow ..."
DoEvents
Loop
'show text of HTML document returned
Set html = ie.document
'close down IE and reset status bar
Set ie = Nothing
Application.StatusBar = ""
'clear old data out and put titles in
'Cells.Clear
'put heading across the top of row 3
Range("A3").Value = "Date Pulled"
Range("B3").Value = "Keyword"
Range("C3").Value = "# Of Tags"
'Range("C3").Value = "Asked This Week"
Range("D3").Value = "Description"
Dim TagList As IHTMLElement
Dim Tags As IHTMLElementCollection
Dim Tag As IHTMLElement
Dim RowNumber As Long
Dim TagFields As IHTMLElementCollection
Dim TagField As IHTMLElement
Dim Keyword As String
Dim NumberOfTags As String
'Dim AskedThisWeek As String
Dim TagDescription As String
'Dim QuestionFieldLinks As IHTMLElementCollection
Dim TodaysDate As Date
Set TagList = html.getElementById("tags-browser")
Set Tags = html.getElementsByClassName("tag-cell")
RowNumber = 4
For Each Tag In Tags
'if this is the tag containing the details, process it
If Tag.className = "tag-cell" Then
'get a list of all of the parts of this question,
'and loop over them
Set TagFields = Tag.all
For Each TagField In TagFields
'if this is the keyword, store it
If TagField.className = "post-tag" Then
'store the text value
Keyword = TagField.innerText
Cells(RowNumber, 2).Value = TagField.innerText
End If
If TagField.className = "item-multiplier-count" Then
'store the integer for number of tags
NumberOfTags = TagField.innerText
'NumberOfTags = Replace(NumberOfTags, "x", "")
Cells(RowNumber, 3).Value = Trim(NumberOfTags)
End If
If TagField.className = "excerpt" Then
Description = TagField.innerText
Cells(RowNumber, 4).Value = TagField.innerText
End If
TodaysDate = Format(Now, "MM/dd/yy")
Cells(RowNumber, 1).Value = TodaysDate
Next TagField
'go on to next row of worksheet
RowNumber = RowNumber + 1
End If
Next
Set html = Nothing
'do some final formatting
Range("A3").CurrentRegion.WrapText = False
Range("A3").CurrentRegion.EntireColumn.AutoFit
Range("A1:C1").EntireColumn.HorizontalAlignment = xlCenter
Range("A1:D1").Merge
Range("A1").Value = "StackOverflow Tag Trends"
Range("A1").Font.Bold = True
Application.StatusBar = ""
MsgBox "Done!"
End Sub
最佳答案
当 Stack Overflow 通过数据浏览器等工具向您提供基础数据时,无需抓取 Stack Overflow。在数据资源管理器中使用此查询应该会得到您需要的结果:
select t.TagName, t.Count, p.Body
from Tags t inner join Posts p
on t.ExcerptPostId = p.Id
order by t.count desc;
该查询的永久链接是 here查询运行后出现的“下载 CSV”选项可能是将数据导入 Excel 的最简单方法。如果您想自动化这部分工作,则 CSV 结果下载的直接链接是 here
关于vba - Excel VBA 宏 : Scraping data from site table that spans multiple pages,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26664701/