I set up a very basic web scraper to check stock of a specific item on Costco.com for my grandfather. It's working great locally, but when I run it through Heroku it fails (seemingly 50% of the time). Here's the code for the scraper
const task = () => {
// toggle so doesn't send message multiple times if continuously available
let alreadyAvailable = false;
let url = 'http://www.costco.com/Kirkland-Signature-Four-Piece-Urethane-Cover-Golf-Ball,-2-dozen.product.100310467.html';
request(url, function(error, response, html){
let $ = cheerio.load(html);
if(error){
throw new Error(error);
}
if ( $('#product-page #product-details #ctas #add-to-cart input[type="button"]')['0'].attribs.value === 'Out of Stock') {
alreadyAvailable = false;
console.log("still out of stock");
} else {
if (alreadyAvailable === false) {
sendMessage();
alreadyAvailable = true;
}
}
});
};
and here are the logs
2016-12-25T03:48:39.675549+00:00 heroku[scheduler.5440]: Starting process with command `node scraper.js`
2016-12-25T03:48:40.262503+00:00 heroku[scheduler.5440]: State changed from starting to up
2016-12-25T03:48:41.509416+00:00 app[scheduler.5440]: /app/scraper.js:34
2016-12-25T03:48:41.509432+00:00 app[scheduler.5440]: if ( $('#product-page #product-details #ctas #add-to-cart input[type="button"]')['0'].attribs.value === 'Out of Stock') {
2016-12-25T03:48:41.509433+00:00 app[scheduler.5440]: ^
2016-12-25T03:48:41.509433+00:00 app[scheduler.5440]:
2016-12-25T03:48:41.509434+00:00 app[scheduler.5440]: TypeError: Cannot read property 'attribs' of undefined
2016-12-25T03:48:41.509434+00:00 app[scheduler.5440]: at Request._callback (/app/scraper.js:34:90)
2016-12-25T03:48:41.509435+00:00 app[scheduler.5440]: at Request.self.callback (/app/node_modules/request/request.js:186:22)
2016-12-25T03:48:41.509436+00:00 app[scheduler.5440]: at emitTwo (events.js:106:13)
2016-12-25T03:48:41.509436+00:00 app[scheduler.5440]: at Request.emit (events.js:191:7)
2016-12-25T03:48:41.509436+00:00 app[scheduler.5440]: at Request.<anonymous> (/app/node_modules/request/request.js:1081:10)
2016-12-25T03:48:41.509437+00:00 app[scheduler.5440]: at emitOne (events.js:96:13)
2016-12-25T03:48:41.509437+00:00 app[scheduler.5440]: at Request.emit (events.js:188:7)
2016-12-25T03:48:41.509438+00:00 app[scheduler.5440]: at IncomingMessage.<anonymous> (/app/node_modules/request/request.js:1001:12)
2016-12-25T03:48:41.509438+00:00 app[scheduler.5440]: at IncomingMessage.g (events.js:291:16)
2016-12-25T03:48:41.509439+00:00 app[scheduler.5440]: at emitNone (events.js:91:20)
2016-12-25T03:48:41.509439+00:00 app[scheduler.5440]: at IncomingMessage.emit (events.js:185:7)
2016-12-25T03:48:41.509439+00:00 app[scheduler.5440]: at endReadableNT (_stream_readable.js:974:12)
2016-12-25T03:48:41.509440+00:00 app[scheduler.5440]: at _combinedTickCallback (internal/process/next_tick.js:74:11)
2016-12-25T03:48:41.509440+00:00 app[scheduler.5440]: at process._tickCallback (internal/process/next_tick.js:98:9)
2016-12-25T03:48:41.560539+00:00 heroku[scheduler.5440]: State changed from up to complete
2016-12-25T03:48:41.550655+00:00 heroku[scheduler.5440]: Process exited with status 1
2016-12-25T03:58:42.438807+00:00 app[api]: Starting process with command `node scraper.js` by user [email protected]
2016-12-25T03:58:43.701468+00:00 heroku[scheduler.5038]: Starting process with command `node scraper.js`
2016-12-25T03:58:44.312279+00:00 heroku[scheduler.5038]: State changed from starting to up
2016-12-25T03:58:45.769564+00:00 app[scheduler.5038]: still out of stock
2016-12-25T03:58:45.827867+00:00 heroku[scheduler.5038]: State changed from up to complete
2016-12-25T03:58:45.814921+00:00 heroku[scheduler.5038]: Process exited with status 0
You can see that sometimes I get the console log inside of the if-block and others I get a type error because it's trying to read attributes from an html element that don't exist. I was thinking that this might be a async issue, but I'm not sure how to go about fixing it. I assumed Request wasn't running the callback until it got all the html.
The problem here is what Costco's website is returning.
Your code is failing when parsing the DOM via Cheerio. What this means in your case is that the particular HTML you're attempting to scrape doesn't actually exist (that's what the error is saying).
This could be caused by a few possible things:
What I would do if I were you is this:
I'm willing to bet you will be surprised =)