Programmatically capturing AJAX traffic with headless Chrome

10.9k views Asked by At

Chrome officially supports running the browser in headless mode (including programmatic control via the Puppeteer API and/or the CRI library).

I've searched through the documentation, but I haven't found how to programmatically capture the AJAX traffic from the instances (ie. start an instance of Chrome from code, navigate to a page, and access the background response/request calls & raw data (all from code not using the developer tools or extensions).

Do you have any suggestions or examples detailing how this could be achieved? Thanks!

4

There are 4 answers

2
ebidel On BEST ANSWER

Update

As @Alejandro pointed out in the comment, resourceType is a function and the return value is lowercased

page.on('request', request => {
    if (request.resourceType() === 'xhr')
    // do something
});

Original answer

Puppeteer's API makes this really easy:

page.on('request', request => {
  if (request.resourceType === 'XHR')
    // do something
});

You can also intercept requests with setRequestInterception, but it's not needed in this example if you're not going to modify the requests.

There's an example of intercepting image requests that you can adapt.

resourceTypes are defined here.

0
Andrei On

I finally found how to do what I wanted. It can be done with chrome-remote-interface (CRI), and node.js. I'm attaching the minimal code required.

const CDP = require('chrome-remote-interface');

(async function () {

    // you need to have a Chrome open with remote debugging enabled
    // ie. chrome --remote-debugging-port=9222
    const protocol = await CDP({port: 9222});

    const {Page, Network} = protocol;
    await Page.enable();
    await Network.enable(); // need this to call Network.getResponseBody below

    Page.navigate({url: 'http://localhost/'}); // your URL

    const onDataReceived = async (e) => {
        try {
            let response = await Network.getResponseBody({requestId: e.requestId})
            if (typeof response.body === 'string') {
                console.log(response.body);
            }
        } catch (ex) {
            console.log(ex.message)
        }
    }

    protocol.on('Network.dataReceived', onDataReceived)
})();
0
ahuigo On

Puppeteer's listeners could help you capture xhr response via response and request event.

You should check wether request.resourceType() is xhr or fetch first.

        listener = page.on('response', response => {
            const isXhr = ['xhr','fetch'].includes(response.request().resourceType())
            if (isXhr){
                console.log(response.url());
                response.text().then(console.log)
            }
        })
0
Sun On
const browser = await puppeteer.launch();
const page = await browser.newPage();
const pageClient = page["_client"];
pageClient.on("Network.responseReceived", event => {
  if (~event.response.url.indexOf('/api/chart/rank')) {
    console.log(event.response.url);
    pageClient.send('Network.getResponseBody', {
      requestId: event.requestId
    }).then(async response => {
      const body = response.body;
      if (body) {
        try {
          const json = JSON.parse(body);

        }
        catch (e) {
        }
      }
    });
  }
});

await page.setRequestInterception(true);
page.on("request", async request => {
  request.continue();
});
await page.goto('http://www.example.com', { timeout: 0 });