Access DOM of google.com using Lua script in Splash

495 views Asked by At

I am trying to run a Lua script in Splash to perform a Google search and take the screenshot of search results. When I try to select the Google search box using xpath or css selector in my Lua script I get this error:

{
    "error": 400,
    "type": "ScriptError",
    "description": "Error happened while executing Lua script",
    "info": {
        "message": "[string \"function main(splash, args)\r...\"]:9: cannot select the specified element {'type': 'JS_ERROR', 'js_error_type': 'SyntaxError', 'js_error_message': 'SyntaxError: DOM Exception 12', 'js_error': 'Error: SyntaxError: DOM Exception 12', 'message': \"JS error: 'Error: SyntaxError: DOM Exception 12'\"}",
        "type": "SPLASH_LUA_ERROR",
        "splash_method": "select",
        "source": "[string \"function main(splash, args)\r...\"]",
        "line_number": 9,
        "error": "cannot select the specified element {'type': 'JS_ERROR', 'js_error_type': 'SyntaxError', 'js_error_message': 'SyntaxError: DOM Exception 12', 'js_error': 'Error: SyntaxError: DOM Exception 12', 'message': \"JS error: 'Error: SyntaxError: DOM Exception 12'\"}"
    }
}

This is my Lua script :

function main(splash, args)

  splash.private_mode_enabled = false
  splash:set_user_agent("Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0")
  
  assert(splash:go(args.url))
  assert(splash:wait(1.0))

  search_box = assert(splash:select("//div[@class='a4bIc']/input"))
  search_box:focus()
  search_box:send_text('my user agent')
  search_box:send_keys('<Enter>')
  assert(splash:wait(2.0))
  
  return splash:png()
end

I tried to set custom headers, run the script in private mode but nothing works. However, the same script runs without error and with correct output when using duckduckgo.com. The problem comes when target URL is google.com. I think google detects that the browser is being controlled by a bot(script) so it disables access to DOM tree.

Any idea how to make it work?

2

There are 2 answers

1
Doyousketch2 On

There's something wrong with your selector.

"//div[@class='a4bIc']/input"

Open the webpage, tap F12 and then use the inspector to find out what div class to target for that input field. It's also possible that their classname is being generated on the fly to obfuscate it.

5
Doyousketch2 On

Maybe the page hasn't fully downloaded / rendered yet

function main(splash, args)
    splash.private_mode_enabled = false
    splash:set_user_agent("Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0")

    local ok, reason = assert( splash:go(args.url) )

    if ok then
        local wait, increment, maxwait = 0, 0.1, 10
        while wait < maxwait and not splash:select("//div[@class='a4bIc']/input") do
            splash:wait(increment)  --  wait until it exists, or times out
            wait = wait +increment
        end
        if wait >= maxwait then
            print('Timed out')
        else
            search_box = splash:select("//div[@class='a4bIc']/input")
            search_box:focus()
            search_box:send_text('my user agent')
            search_box:send_keys('<Enter>')
            splash:wait(2.0)
            return splash:png()
        end
    else
        print( reason )  --  see if it tells you why
    end
end