I am working with cheerio.js to make a simple web scraper. For some reason it does not respond to certain html tags. One div I cannot target is the div with the class of 'dataTables_scrollBody' on the website that I am scraping: http://www.caffeineinformer.com/the-caffeine-database.
However, I think I found a work-around to my problem.
I read through the documentation https://github.com/cheeriojs/cheerio and am following this format $( selector, [context], [root] .
$(".main, div:nth-child(3) ").filter(function(){
var data = $(this).prev().text();
console.log(data);
})
In my console I am getting the data that I desire but with two problems
1. Caffeine Content of Drinks All Coffee Soda Energy Drinks Tea Shots
Loading data.../*<![CDATA[*/var totalrows=1127;
var latestdate='06/12/2015';var tbldata=
I do not see this info on the page.
2. I am getting my data back two times.
I put in a console.log for the data length. I got back 8 different lengths. I believe there is a workaround. However, I cannot figure this out.
Does anyone have any knowledge on the matter?
DataTables is a Javascript library that dynamically creates, inserts and modifies HTML elements in the DOM, after the page has been loaded. The table you want to scrape is created dynamically, but your scraper only works on static HTML.
The data that is used to generate the table is stored as Javascript in the page source, in a variable called
tbldata
(see this gist).Two possible solutions: