playwright - get content from multiple pages in parallel

1.7k views Asked by At

I am trying to get the page content from multiple URLs using playwright in a nodejs application. My code looks like this:

const getContent = async (url: string): Promise<string> {
   const browser = await firefox.launch({ headless: true });
   const page = await browser.newPage();

   try {
      await page.goto(url, {
         waitUntil: 'domcontentloaded',
      });

      return await page.content();
   } finally {
      await page.close();
      await browser.close();
   }
}

const items = [
   {
      urls: ["https://www.google.com", "https://www.example.com"] 
      // other props
   },
   {
      urls: ["https://www.google.com", "https://www.example.com"] 
      // other props
   },
   // more items...
]

await Promise.all(
   items.map(async (item) => {
      const contents = [];

      for (url in item.urls) {
         contents.push(await getContent(url))
      }

      return contents;
   }
)

I am getting errors like error (Page.content): Target closed. but I noticed that if I just run without loop:

const content = getContent('https://www.example.com');

It works.

It looks like each iteration of the loops share the same instance of browser and/or page, so they are closing/navigating away each other.

To test it I built a web API with the getContent function and when I send 2 requests (almost) at the same time one of them fails, instead if send one request at the time it always works.

Is there a way to make playwright work in parallel?

1

There are 1 answers

3
refactoreric On

I don't know if that solves it, but noticed there are two missing awaits. Both the firefox.launch(...) and the browser.newPage() are asynchronous and need an await in front.

Also, you don't need to launch a new browser so many times. PlayWright has the feature of isolated browser contexts, which are created much faster than launching a browser. It's worth experimenting with launching the browser before the getContent function, and using

const context = await browser.newContext(); 
const page = await context.newPage();