StringUtil indexOf() equivalent postgreSQL query

10.5k views Asked by At

I need to implement stringUtils Class indexOf() method in postgresql.

Lets say I have a table in which url is one of the column.

url : "http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit"

My requirement is to find the index of the 3rd occurence of '/' in the above url and do substring and take only paypal-info.com host name in Postgresql Query

Any idea on implementing this would be grateful. Thanks

3

There are 3 answers

4
Michał Schielmann On

Have you tried split_part method?

SELECT split_part('http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit', '/', 3)

Result:

split_part
paypal-info.com

For other string functions try this doc: http://www.postgresql.org/docs/9.1/static/functions-string.html

Edit: as for indexOf itself I don't know any built-in postgres solution. But using two string functions You can achieve it like this:

SELECT strpos('http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit', split_part('http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit', '/', 4)) - 1 as index_of;
6
Vlad On

Just replace 3 with N to get the index of the Nth '/' in a given string

SELECT length(substring('http://asd/asd', '(([^/]*/){3})')) - 1

To extract the host name from url you can use

SELECT substring('http://asd.com:234/qwe', 'http://([^:]+).*/')

Tested here: SQLFiddle

4
Craig Ringer On

The string functions and operators section of the manual is the equivalent of String.indexOf, e.g.

select position('/' in 'http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit');

however it doesn't offer the option to get the n'th occurrence.

You're really approaching this all wrong. You should use proper URL parsing code to extract the host portion, not attempt to roll your own or use regex / splitting / string mangling.

PostgreSQL doesn't have a native URL/URI type, but its procedural languages do and it's trivial to wrap suitable functions. e.g. with PL/Python:

create language plpythonu;
create or replace function urlhost(url text) returns text 
language plpythonu 
immutable strict
as $$
import urlparse
return urlparse.urlparse(url).netloc
$$;

then:

regress=# select urlhost('http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit');
     urlhost     
-----------------
 paypal-info.com
(1 row)

If you'd prefer to use PL/Perl, PL/V8, or whatever, that's fine.

For best performance, you could write a simple C function and expose that as an extension.