source code of web page not available using urllib.urlopen()

8.3k views Asked by At

I am trying to get video links from 'https://www.youtube.com/trendsdashboard#loc0=ind'. When I do inspect elements, it displays me the source html code for each videos. In source code retrieved using

urllib2.urlopen("https://www.youtube.com/trendsdashboard#loc0=ind").read()

It does not display html source for videos. Is there any otherway to do this?

<a href="/watch?v=dCdvyFkctOo" alt="Flipkart Wish Chain">
        <img src="//i.ytimg.com/vi/dCdvyFkctOo/hqdefault.jpg" alt="Flipkart Wish Chain">
      </a>

This simple code appears when we inspect elements from browser, but not in source code retrived by urllib

4

There are 4 answers

1
Alexander McFarlane On

works for me...

import urllib2
url = 'https://www.youtube.com/trendsdashboard#loc0=ind'
html = urllib.urlopen(url).read()

IMO I'd use requests instead of urllib - it's a bit easier to use:

import requests
url = 'https://www.youtube.com/trendsdashboard#loc0=ind'
response = requests.get(url)
html = response.content

Edit

This will get you a list of all <a></a> tags with hyperlinks as per your edit. I use the library BeautifulSoup to parse the html:

from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
links = [tag for tag in soup.findAll('a') if tag.has_attr('href')]
3
Ajay On

To view the source code you need use read method If you just use open it gives you something like this.

In [12]: urllib2.urlopen('https://www.youtube.com/trendsdashboard#loc0=ind')
Out[12]: <addinfourl at 3054207052L whose fp = <socket._fileobject object at 0xb60a6f2c>>

To see the source use read

urllib2.urlopen('https://www.youtube.com/trendsdashboard#loc0=ind').read()
5
Vikas Ojha On

Whenever you compare the source code between Python code and Web browser, dont do it through Insect Element, right click on the webpage and click view source, then you will find the actual source. Inspect Element displays the aggregated source code returned by as many network requests created as well as javascript code being executed.

Keep Developer Console open before opening the webpage, stay on Network tab and make sure that 'Preserve Log' is open for Chrome or 'Persist' for Firebug in Firefox, then you will see all the network requests made.

1
rishav On

we also need to decode the data to utf-8. here is the code:

just use response.decode('utf-8') print(response)