I have to get the hostname & pathname from an URL using regular expression.
URL can be in the form of
a) http://test.com/mary/archive/project.aspx
--> I want http::/test.com/mary
b) http://www.testmary.com/company
--> I want http://www.testmary.com/
c) http://marytest.com/b/about/
--> I want http://marytest.com/
d) http://mary.test.com/b/mary/project.aspx
--> I want http://mary.test.com/b/mary/
Basically I am looking get records that have url "mary" and get only hostname or pathname depending on where "mary" is.
Any help is appreciated.
Thanks R
You don't need CLR and RegEx for this.
Results:
The only thing I didn't understand is why the first row in the output should be missing a trailing slash, while the other rows include it. In my query this trailing slash is included in all rows. If it shouldn't be included only on that one row, you'll need to explain why.
I do recommend getting a better handle on data cleansing, though. Why would you allow some URLs without the http:// prefix?