I want to add scheme in urls if not present.
import urlparse
p = urlparse.urlparse(url)
print p
netloc = p.netloc or p.path
path = p.path if p.netloc else ''
scheme = p.scheme or 'http'
p = urlparse.ParseResult(scheme, netloc, path, *p[3:])
url = p.geturl()
print url
The above code works great, in case when I dont have any port number. When port number is there, it show arbitary output. For eg:-
input go.com:8000/3/
output go.com://8000/3/
Same goes for localhost
. What approach should I been following in this case?
if you have port number and dont have the url scheme your url must start with
//
. urlparse recognizes a netloc only if it is properly introduced by ‘//’. Otherwise the input is presumed to be a relative URL and thus to start with a path component.check out the following code and observe the diffrence
1) In this first sample i have added // so that the parser will identify it as the netloc rather than the scheme and then comes the path.
2) In this sample we dont have the scheme and dint specify the // and we dont have the port number so the entire url is considered as the path.
3)In this sample i did specify the port. we know that after the scheme we have ://, parser recognized before : as the scheme and after : as path.
this is how the urlparse is parsing the url. for you to get the url scheme to work, check for :// if you dint find explicitly append // in the front of your url then the job will be done.
for more detail you can visit this url [https://docs.python.org/2/library/urlparse.html]