Best visible content extractor available

Question

Best visible content extractor available

148 views Asked by najeeb At 02 January 2017 at 10:12

So my application needs visible content from a given URL, like just the text part, no html no header or footer data. As of now I am using beautifulsoup and boilerpipe for getting the same. But in some rare cases I am not getting enough data or the right data. So was wondering is there any other competitor, programming language is not a barrier.

Original Q&A

There are 1 answers

**eLRuLL** · Answer 1 · 2017-01-02T13:19:34+00:00

I would recommend xpath or css extractors directly for content extraction, both selectors are already simply implemented on parsel module.

For a complete suite of web-crawling + content extractor, scrapy would be my preferred option.

And if you want to extract to visually select what parts of the html to extract, I would recommend portia.

Hope that helped.

TechQA.

Best visible content extractor available

There are 1 answers

Related Questions in WEB-SCRAPING

Related Questions in WEB-CRAWLER

Related Questions in SCREEN-SCRAPING

Related Questions in HTML-CONTENT-EXTRACTION

Popular Questions

Trending Questions