I'm brand new to Scrapy. I have learned how to use
response.css() for reading specific aspects from a web page, and am avoiding learning the xpath system. It seems to do the exact same thing, but in a different format (correct me if I'm wrong)
The site I'm scraping has long paragraphs of text, with an occasional linked text right in the middle. This sentence with a link to a picture of a dog is an example. I'm not sure if there is a way to have a spider read the text, with links in place (I've only been using
Is there a way, using CSS (preferably) or xpath that I can grab all text in the paragraphs including the link-embedded text, without moving the links or link-text out of the sentence? The wording is difficult on this so apologies if I need to re-explain or give an example.
edit: some clarification is needed, this was poorly explained initially. A statement in this webpage can look like:
<p>My sentence has a <a href="https://www.google.com">link to google</a> in it.</p>
But when you use
response.css("p::text").extract(), that sentence would show up as the list ["My sentence has a ","in it."], completely negating the text in the link. My goal is to get: ["My sentence has a link to google in it."]