My code uses Puppeteer to scrape a public API. (I can't get axios to work). It takes product's prices and saves the data into a SQLite3 database.
The API has a view limit in every query, so it scrapes:
- www.apiexample.com/api/query=shoes/start=0
- www.apiexample.com/api/query=shoes/start=48
- www.apiexample.com/api/query=shoes/start=96 ... there are 5200 products
It does this every hour and compares the database data with the new scraped data to look for differences. (Prices DO change every now and then--not very frequently).
Is there a way to scrape the URL items ONLY if there are changes in the data?
I'm looking into caching concepts but I can't figure out how to do it. Is there different logic to tackle this project?
async function fetchData(browser, queryItem, start) {
if (!browser) {
console.error('Browser instance is not initialized');
return;
}
const page = await browser.newPage();
try {
const url = `${baseURL}?query=${queryItem}&start=${start}`;
await page.goto(url, { waitUntil: 'networkidle0' });
const data = await page.evaluate(() => JSON.parse(document.body.innerText));
let filteredItems = {};
if(data.raw.itemList.items){
filteredItems = data.raw.itemList.items.map(item => ({
link: item.link,
displayName: item.displayName,
productId: item.productId,
price: item.price,
salePrice: item.salePrice,
salePercentage: item.salePercentage,
imageUrl: item.image.src
}));
}
await saveToDatabase(filteredItems);
return {
totalCount: data.raw.itemList.count,
items: filteredItems
};
} catch (error) {
console.error('Error fetching data:', error);
} finally {
await page.close();
}
}