Python urlparser Gives wrong result

Question

Python urlparser Gives wrong result

321 views Asked by MikeSchem At 31 August 2017 at 23:01

I'm trying to separate the different parts of a url with python s urlparse, but I'm seeming to get the wrong values in the results.

baseline = runSql(conn,"Select url from malware_traffic where tag = 'baseline';")

for i in baseline:
    print i[0]
    print urlparse.urlparse(i[0])

the runSql function just returns a list of urls. I loop through them and attempt to turn the urls from the baseline variable into urls, but the way python parses the urls seems to be incorrect

172.217.9.174:443/c2dm/register3
ParseResult(scheme='172.217.9.174', netloc='', path='443/c2dm/register3', params='', query='', fragment='')
connectivitycheck.gstatic.com:80/generate_204
ParseResult(scheme='connectivitycheck.gstatic.com', netloc='', path='80/generate_204', params='', query='', fragment='')
www.google.com:80/gen_204
ParseResult(scheme='www.google.com', netloc='', path='80/gen_204', params='', query='', fragment='')
172.217.9.174:443/auth/devicekey
ParseResult(scheme='172.217.9.174', netloc='', path='443/auth/devicekey', params='', query='', fragment='')

In the results you can clearly see that it is mixing up scheme and netloc as well as including the port in path.

For instance the first result should be this.

ParseResult(scheme='', netloc='172.217.9.174:443', path='/c2dm/register3', params='', query='', fragment='')

not sure why it's getting messed up.

I'm practically using the same thing as one of the examples in the documentation here https://docs.python.org/2/library/urlparse.html.

So what am I doing wrong or is it a bug?

Original Q&A

There are 1 answers

**Aran-Fey** · Accepted Answer · 2017-08-31T23:09:22+00:00

The problem is that your urls don't have a scheme (the http:// part), so python thinks 172.217.9.174: is the scheme. Prefixed with http:// everything works as expected:

>>> urlparse('172.217.9.174:443/c2dm/register3')
ParseResult(scheme='172.217.9.174', netloc='', path='443/c2dm/register3', params='', query='', fragment='')
>>> urlparse('http://172.217.9.174:443/c2dm/register3')
ParseResult(scheme='http', netloc='172.217.9.174:443', path='/c2dm/register3', params='', query='', fragment='')

TechQA.

Python urlparser Gives wrong result

There are 1 answers

Related Questions in PYTHON

Related Questions in PARSING

Related Questions in URL

Related Questions in URLPARSE

Popular Questions

Popular Tags

Trending Questions