Using Puppeteer to scrape a public API only when the data changes

66 views Asked by At

My code uses Puppeteer to scrape a public API. (I can't get axios to work). It takes product's prices and saves the data into a SQLite3 database.

The API has a view limit in every query, so it scrapes:

It does this every hour and compares the database data with the new scraped data to look for differences. (Prices DO change every now and then--not very frequently).

Is there a way to scrape the URL items ONLY if there are changes in the data?

I'm looking into caching concepts but I can't figure out how to do it. Is there different logic to tackle this project?

async function fetchData(browser, queryItem, start) {
    if (!browser) {
        console.error('Browser instance is not initialized');
        return;
    }
    const page = await browser.newPage();

    try {
        const url = `${baseURL}?query=${queryItem}&start=${start}`;
        await page.goto(url, { waitUntil: 'networkidle0' });

        const data = await page.evaluate(() => JSON.parse(document.body.innerText));
        let filteredItems = {};
        if(data.raw.itemList.items){
            filteredItems = data.raw.itemList.items.map(item => ({
                link: item.link,
                displayName: item.displayName,
                productId: item.productId,
                price: item.price,
                salePrice: item.salePrice,
                salePercentage: item.salePercentage,
                imageUrl: item.image.src
            }));
        }     

        await saveToDatabase(filteredItems);

        return {
            totalCount: data.raw.itemList.count,
            items: filteredItems
        };
    } catch (error) {
        console.error('Error fetching data:', error);
    } finally {
        await page.close();
    }
}
0

There are 0 answers