How to get the complete URL address most efficiently?

5.6k views Asked by At

I'm using a Java program to get expanded URLs from short URLs. Given a Java URLConnection, among the two approaches, which one is better to get the desired result?

Connection.getHeaderField("Location");

vs

Connection.getURL();

I guess both of them give the same output. The first approach did not give me the best results, only 1 out of 7 were resolved. Can the efficiency be increased by the second approach?

Can we use any other better approach?

2

There are 2 answers

9
palacsint On BEST ANSWER

I'd use the following:

@Test
public void testLocation() throws Exception {
    final String link = "http://bit.ly/4Agih5";

    final URL url = new URL(link);
    final HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
    urlConnection.setInstanceFollowRedirects(false);

    final String location = urlConnection.getHeaderField("location");
    assertEquals("http://stackoverflow.com/", location);
    assertEquals(link, urlConnection.getURL().toString());
}

With setInstanceFollowRedirects(false) the HttpURLConnection does not follow redirects and the destination page (stackoverflow.com in the above example) will not be downloaded just the redirect page from bit.ly.

One drawback is that when a resolved bit.ly URL points to another short URL for example on tinyurl.com you will get a tinyurl.com link, not what the tinyurl.com redirects to.

Edit:

To see the reponse of bit.ly use curl:

$ curl --dump-header /tmp/headers http://bit.ly/4Agih5
<html>
<head>
<title>bit.ly</title>
</head>
<body>
<a href="http://stackoverflow.com/">moved here</a>
</body>
</html>

As you can see bit.ly sends only a short redirect page. Then check the HTTP headers:

$ cat /tmp/headers
HTTP/1.0 301 Moved Permanently
Server: nginx
Date: Wed, 06 Nov 2013 08:48:59 GMT
Content-Type: text/html; charset=utf-8
Cache-Control: private; max-age=90
Location: http://stackoverflow.com/
Mime-Version: 1.0
Content-Length: 117
X-Cache: MISS from cam
X-Cache-Lookup: MISS from cam:3128
Via: 1.1 cam:3128 (squid/2.7.STABLE7)
Connection: close

It sends a 301 Moved Permanently response with a Location header (which points to http://stackoverflow.com/). Modern browsers don't show you the HTML page above. Instead they automatically redirect you to the URL in the Location header.

0
plb On

The above link contains a more complete method along the same line as the previous post https://github.com/cpdomina/WebUtils/blob/master/src/net/cpdomina/webutils/URLUnshortener.java