Mojo::UserAgent and JavaScript

700 views Asked by At

I wondering if something like the below could be possible to do with Mojo::UserAgent :

let's say i have the below code :

my $ua  = Mojo::UserAgent->new;
my $res = $ua->get('mojolicious.org/perldoc')->result;

is it possible to intercept Mojo::UserAgent request and send it to other web client which know javascript where its result sent back as as Mojo::Transaction::HTTP ($res above ),where the user can continue use Mojo::UserAgent interface results .

i.e i want the following :

Mojo::UserAgent-> HTTP request -> intercept the HTTP Request -> Send the HTTP Request to web client support javascript like WWW::Chrome::Mechanize or FireFox::Marionette -> the JavaScript Web Client do the request -> the returned result intercepted and changed to Mojo::Transaction::HTTP

or

Mojo::UserAgent -> non blocking HTTP request ->non blocking HTTP response -> send to embedded web browser like webkit -> get the result as Mojo::Transaction::HTTP

Any ideas /examples how to let Mojo::UserAgent work with javascript?

2

There are 2 answers

0
brian d foy On

It's almost always possible but the actual question is the amount of work you'd do to accomplish it. Part of this particular answer is how you'd want to intercept requests. That's the easy part because you can wrap the start method (as Mojo::UserAgent::Role::Queued does).

After you intercept the request, do whatever you like. Get the raw response and have Mojo parse it and build that part of the transaction. After that you re-enter the normal progression.

For awhile people would recommend the headless browser phantomjs but it looks like that project has stalled. There's Joel Berger's Mojo::Phantom but that's not quite what you want.

Lastly, remember that almost everyone would like this to exist but it doesn't. That's important information there. ;)

If you still want to work on this, asking more narrowly-scoped questions along the way are likely to help more.

Good luck!

0
Yordan Georgiev On

Not exactly what you asked, but probably close enough could be achieved by:

  • installing chrome-headless
  • setting up nodejs + couple of modules
  • getting the loaded and parsed with client side events outerHTML
  • reading from perl code the outerHTML

as follows:

# just a perl oneliner, parsing the scrapped html and passing it to Mojo::DOM
perl -MMojo::DOM -e '$s=`node scrap-html.js`; for my $e (Mojo::DOM->new($s)->find("html body a.scroll")->each){ print $e->text}';

where the code for the scrap-html.js

  // file: scrap-html.js src: https://gist.github.com/magician11/a979906401591440bd6140bd14260578
  const CDP = require('chrome-remote-interface');
  const chromeLauncher = require('chrome-launcher');

  (async function() {
    const launchChrome = () =>
      chromeLauncher.launch({ chromeFlags: ['--disable-gpu', '--headless','--blink-settings=imagesEnabled=false'] });

    const chrome = await launchChrome();
    const protocol = await CDP({ port: chrome.port });
    const timeout = ms => new Promise(resolve => setTimeout(resolve, ms));

    // See API docs: https://chromedevtools.github.io/devtools-protocol/
    const { Page, Runtime, DOM } = protocol;
    await Promise.all([Page.enable(), Runtime.enable(), DOM.enable()]);

    uri = 'https://qto.fi/qto/view/readme_doc'
    Page.navigate({ url: uri });

    // wait until the page says it's loaded...
    Page.loadEventFired(async () => {
      try {
        await timeout(4000); // give the JS some time to load

        // get the page source
        const rootNode = await DOM.getDocument({ depth: -1 });
        const pageSource = await DOM.getOuterHTML({
          nodeId: rootNode.root.nodeId
        });
        protocol.close();
        chrome.kill();
        console.log ( pageSource.outerHTML)
      } catch (err) {
        console.log(err);
      }
    });
  })();
  //eof file: scrap-html.js

example for the whole setup on ubuntu:

  # start install chromium-headless
  sudo apt-get update
  sudo apt-get install -y software-properties-common
  sudo apt-get install -y chromium-browser
  sudo apt-get update

  wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
  sudo dpkg -i google-chrome-stable_current_amd64.deb
  apt --fix-broken install
  # stop install chromium-headless
  # start installing the nodejs + node modules
  sudo apt install nodejs
  sudo npm install -g chrome-remote-interface
  sudo npm install -g chrome-launcher
  export NODE_PATH=/usr/local/lib/node_modules
  # stop installing the nodejs + modules