I am using Puppeteer for a web scraping application. Page.evaluate function is returning null values. But the same function in the browser console returns the right values.

const puppeteer = require('puppeteer');
let scrape = async () => {
  const browser = await puppeteer.launch({headless:false});
  const page = await browser.newPage();
  var ticker = 'DIS';
  var my_url = 'https://seekingalpha.com/symbol/'  + ticker + '/momentum/moving-averages'; 
  await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0');
  await page.goto(my_url);


  page.on('console', msg => console.log('PAGE LOG:', msg.text()));
  const result = await page.evaluate(() => {
    const elements = Array.from(document.querySelectorAll('table tr td'));
    let links = elements.map(element => {
        return element.href
    })
    console.log(links, 'inside page.evaluate');
    return links;
  });
    browser.close();
    return result;
};  


scrape().then((value) => {
  console.log(value); // Success!

Getting the following results....

PAGE LOG: [email protected] inside page.evaluate
[ null, null, null, null, null, null, null, null, null, null ]

in the browser console, I get....

document.querySelectorAll('table tr td')
NodeList(10) [ td.left.left-text, td.middle.center-text, td.middle.center-text, td.middle.center-text, td.right.center-text, td.left.left-text, td.middle.center-text.red, td.middle.center-text.green, td.middle.center-text.green, td.right.center-text.green ]

Would appreciate any help...

With Thomas suggestion, I was able to make the following adjustments and it works now...

sma[0] = await page.$eval('table tr:nth-child(2) td:nth-child(2)', el => {return el.innerHTML });
  sma[1] = await page.$eval('table tr:nth-child(2) td:nth-child(3)', el => {return el.innerHTML });
  sma[2] = await page.$eval('table tr:nth-child(2) td:nth-child(4)', el => {return el.innerHTML });
  sma[3] = await page.$eval('table tr:nth-child(2) td:nth-child(5)', el => {return el.innerHTML });

1 Answers

0
Thomas Dondorf On Best Solutions

Your console.log(links, 'inside page.evaluate') is executing inside the browser runtime. Any data logged or send from the browser to the Node.js environment needs to be serializable (see the docs), which is not the case for DOM elements. Therefore null is shown instead.

To query the elements, you could use the function page.$$(selector). Example:

const tds = await page.$$('table tr td');

The code inside the browser is working fine though. I think your actual problem is that your code is querying the td elements and then trying to map the td elements to their href value. You probably want to iterate over a elements instead I'm assuming. So probably your selector should be 'table tr td a' instead.