Python urlparser Gives wrong result

237 views Asked by At

I'm trying to separate the different parts of a url with python s urlparse, but I'm seeming to get the wrong values in the results.

baseline = runSql(conn,"Select url from malware_traffic where tag = 'baseline';")

for i in baseline:
    print i[0]
    print urlparse.urlparse(i[0])

the runSql function just returns a list of urls. I loop through them and attempt to turn the urls from the baseline variable into urls, but the way python parses the urls seems to be incorrect

172.217.9.174:443/c2dm/register3
ParseResult(scheme='172.217.9.174', netloc='', path='443/c2dm/register3', params='', query='', fragment='')
connectivitycheck.gstatic.com:80/generate_204
ParseResult(scheme='connectivitycheck.gstatic.com', netloc='', path='80/generate_204', params='', query='', fragment='')
www.google.com:80/gen_204
ParseResult(scheme='www.google.com', netloc='', path='80/gen_204', params='', query='', fragment='')
172.217.9.174:443/auth/devicekey
ParseResult(scheme='172.217.9.174', netloc='', path='443/auth/devicekey', params='', query='', fragment='')

In the results you can clearly see that it is mixing up scheme and netloc as well as including the port in path.

For instance the first result should be this.

ParseResult(scheme='', netloc='172.217.9.174:443', path='/c2dm/register3', params='', query='', fragment='')

not sure why it's getting messed up.

I'm practically using the same thing as one of the examples in the documentation here https://docs.python.org/2/library/urlparse.html.

So what am I doing wrong or is it a bug?

1

There are 1 answers

3
Aran-Fey On BEST ANSWER

The problem is that your urls don't have a scheme (the http:// part), so python thinks 172.217.9.174: is the scheme. Prefixed with http:// everything works as expected:

>>> urlparse('172.217.9.174:443/c2dm/register3')
ParseResult(scheme='172.217.9.174', netloc='', path='443/c2dm/register3', params='', query='', fragment='')
>>> urlparse('http://172.217.9.174:443/c2dm/register3')
ParseResult(scheme='http', netloc='172.217.9.174:443', path='/c2dm/register3', params='', query='', fragment='')