Unable to extract web content(href tags) I'm using python 3.7

Question

Unable to extract web content(href tags) I'm using python 3.7

42 views Asked by jack suresh At 13 September 2020 at 05:02

unable to scrape @href tags from "https://www.theaic.co.uk/aic/analyse-investment-companies" I'm using Python 3.7,scrapy, splash and also tried with selenium but no use.

Original Q&A

There are 1 answers

**Andrej Kesely** · Answer 1 · 2020-09-13T08:58:46+00:00

The table you see on the page is inside <iframe>, so you have to load the source of the iframe first:

import requests
from bs4 import BeautifulSoup

url = 'https://www.theaic.co.uk/aic/analyse-investment-companies'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
soup = BeautifulSoup(requests.get('https:' + soup.article.iframe['src']).content, 'html.parser')

for a in soup.select('.gridFundName a'):
    print(a['href'])

Prints:

http://www.theaic.co.uk/3IN
http://www.theaic.co.uk/AAIF
http://www.theaic.co.uk/ADIG
http://www.theaic.co.uk/AEMC
http://www.theaic.co.uk/AJIT
http://www.theaic.co.uk/ALAI
http://www.theaic.co.uk/ABD
http://www.theaic.co.uk/ANII
http://www.theaic.co.uk/ANW
http://www.theaic.co.uk/ASCI
http://www.theaic.co.uk/AASC
http://www.theaic.co.uk/AAS
http://www.theaic.co.uk/ASEI
http://www.theaic.co.uk/ASLI
http://www.theaic.co.uk/ASL
http://www.theaic.co.uk/ASIT
http://www.theaic.co.uk/ASIZ
http://www.theaic.co.uk/AIF
http://www.theaic.co.uk/AIFZ
http://www.theaic.co.uk/AEWU

TechQA.

Unable to extract web content(href tags) I'm using python 3.7

There are 1 answers

Related Questions in PYTHON-3.X

Related Questions in WEB-SCRAPING

Related Questions in SCRAPY

Related Questions in DATA-EXTRACTION

Related Questions in DATA-HARVEST

Popular Questions

Popular Tags

Trending Questions