So I have a Website I want to scrape, structured as follows:
<p><strong>some headline:</strong> some content etc. blabla </p>
<p><strong>some other headline:</strong> some more content etc. blabla </p>
// and so on...
I scrape it with cheerio as follows:
$('p strong').each(function(i, element){
console.log($(this).text());
//gets me the headline
console.log("Parent:" + $(this).parent().text());
//gets me the content, but unfortunately, also the headline again
});
For now, I am just logging everything, but later I want to save headlines & content in separate variables. However, since the headline (which is to be found within the <strong>
tags) is also part of the <p>
tags, my second command (which intends to get content only, no headline, since I already grabbed that) gets not only the content, but also the headline again. How can I separate or delete everything that is in the <strong>
tag, and just save all the rest in the <p>
tag, i.e. only the content?
Probably simplest to remove the headline element: