I'm building a simple scraper using Request.js and Cheerio.js within Express. Right now I'm only looking for the title of the site. Instead scraping a website one by one, I put the list in an array. I parse through them and then use Cheerio.js to find the Title of the website. When I console log the titles, they come out fine, but I want to ultimately show them on a html page. Please note, I'm very new to programming so if you could provide detailed feedback, that would be incredibly helpful (below is the code I've been working on). Thanks in advance!
function parseSites(urls) {
var parsedSites = [];
urls.forEach(function(site) {
request(site, function(err, res, body) {
if(err) {
console.log(err);
} else {
var $ = cheerio.load(body);
parsedSites.push($('title').text());
}
}
});
});
return parsedSites;
}
First you need to understand the difference between asynchronous and synchronous code. Lets see an example:
-
-
The next point is that there is an error in your code!
So your problem is that the 'request' in the forEach loop is an async function that will call the callback 'function(err, res, body)' once there is a response from the web page.
My solutions for this:
Async github page
Async example