Excel VBA - Scrape Google using XMLHTTP (MSXML2.XMLHTTP)

1.1k views Asked by At

I am trying to pull together a daily table of events for certain news topics on Google News.

In a single module I have the following:

Option Explicit
Private Declare Function URLDownloadToFile Lib "urlmon" _
Alias "URLDownloadToFileA" (ByVal pCaller As Long, _
ByVal szURL As String, ByVal szFileName As String, _
ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long

Dim ret As Long

Sub Go()

Dim url As String, i As Integer, numb_H3 As Integer, lastRow As Long, XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object, j1 As Object

url = "https://www.google.co.uk/search?q=" & "Wearables" & "&tbm=nws" ' "&rnd=" & WorksheetFunction.RandBetween(1, 10000)

Set XMLHTTP = CreateObject("MSXML2.XMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
Set objResultDiv = html.getElementById("rso")
numb_H3 = objResultDiv.GetElementsByTagname("H3").Length
For i = 0 To numb_H3 - 1
If numb_H3 > 0 Then
    Set objH3 = objResultDiv.GetElementsByTagname("H3")(i)
    Set link = objH3.GetElementsByTagname("a")(0)

'get thumbnail image location
    Cells(ActiveCell.Row + i, 1).Value = objResultDiv.GetElementsByTagname("img")(i).src
'get news title    
    Cells(ActiveCell.Row + i, 2).Value = objH3.InnerText
'get news link
    Cells(ActiveCell.Row + i, 3).Value = link.href
'get source name
    Cells(ActiveCell.Row + i, 5).Value = "need help"
'get source time
    Cells(ActiveCell.Row + i, 6).Value = "need help"
'get news paragraph
    Cells(ActiveCell.Row + i, 7).Value = "need help"

End If
DoEvents
Next i

html.Close

End Sub

I am able to return the following objects:

enter image description here

I know where the objects are in red that I am trying to get, I'm just struggling with the syntax when using GetElementsByClassName :enter image description here

so for example, I know that the text "ZDNet" lies in:

?...GetElementsByClassName("slp")(i).GetElementsByTagname("span")(0).InnerText

And that the date "7 Jan 2017" lies in:

?...GetElementsByClassName("slp")(i).GetElementsByTagname("span")(2).InnerText

But I can't get the correct syntax.

I'm hoping I've made a really simple mistake, but I am also open to other methods if they are more efficient.

Thanks for reading, Mr. J

0

There are 0 answers