I want to remove multiple slashes with haproxy:
$ curl -I "http://www.host.com//some_path/sub_path/slug/"
and make it send a 301 response for SEO purposes:
HTTP/1.1 301 Moved Permanently
Cache-Control: no-cache
Content-length: 0
Location: /some_path/subpath/slug/
First approach: Create ACL -> replace slashes -> redirect
acl multiple-bars path_reg /{2,}
reqrep ^([^\ :]*)\ /{2,}(.*) \1\ /\2 if multiple-bars
...
redirect prefix / code 301 if multiple-bars
The reqrep changes the URL so that "multiple-bars" is no more true. Therefore I got no 301 redirect
$ curl -I "http://www.host.com//some_path/sub_path/slug/"
HTTP/1.1 200 OK
domain=www.host.com
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Content-Type: text/html; charset=UTF-8
Date: Tue, 09 Jun 2015 18:16:36 GMT
Second approach: Create a Custom Header and redirect if it finds the header
acl multiple-bars path_reg /{2,}
reqadd X-HadMultipleBars if multiple-bars
reqrep ^([^\ :]*)\ /{2,}(.*) \1\ /\2 if multiple-bars
...
redirect prefix / code 301 if { hdr_cnt(X-HadMultipleBars) 1 }
But HAProxy first processes reqrep rules and then reqadd. So:
[ec2-user@haproxy-stage-m3m-a haproxy]$ sudo service haproxy restart
Stopping haproxy: [ OK ]
Starting haproxy: [WARNING] 159/181957 (11430) : parsing [/etc/haproxy/haproxy.cfg:87] : a 'reqrep' rule placed after a 'reqadd' rule will still be processed before.
[WARNING] 159/181957 (11430) : parsing [/etc/haproxy/haproxy.cfg:103] : a 'reqidel' rule placed after a 'reqadd' rule will still be processed before.
[WARNING] 159/181957 (11430) : parsing [/etc/haproxy/haproxy.cfg:104] : a 'reqrep' rule placed after a 'reqadd' rule will still be processed before.
[ OK ]
It won't redirect:
$ curl -I "http://www.host.com//some_path/sub_path/slug/"
HTTP/1.1 200 OK
domain=www.host.com
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Content-Type: text/html; charset=UTF-8
Date: Tue, 09 Jun 2015 18:16:36 GMT
Third approach:
acl multiple-bars path_reg /{2,}
reqrep ^([^\ :]*)\ /{2,}(.*) \1\ /\2\nX-HadMultipleBars: if multiple-bars
...
redirect prefix / code 301 if { hdr_cnt(X-HadMultipleBars) 1 }
I don't like this hack, it looks dirty, but it works: the hdr_cnt
gets the value of 1.
It works with multiple slashes:
$ curl -I "http://www.host.com/////some_path/sub_path/slug/"
HTTP/1.1 301 Moved Permanently
Cache-Control: no-cache
Content-length: 0
Location: /some_path/sub_path/slug/
But it doesn't work with sub_path multiple slashes:
$ curl -I "http://www.host.com//////some_path///sub_path////slug/////"
HTTP/1.1 301 Moved Permanently
Cache-Control: no-cache
Content-length: 0
Location: /some_path///sub_path////slug/////
I'm having trouble to make this regex somewhat recursive, but this code removes some 'random' slashes at each time:
http://www.host.com//////some_path///sub_path////slug/////
301 -> Location: /some_path///sub_path////slug/////
301 -> Location: /some_path//sub_path//slug///
301 -> Location: /some_path/sub_path/slug//
301 -> Location: /some_path/sub_path/slug/
200!
I tried to use this regex which works at http://www.regexr.com/ instead of backreferencing to ([^\ :]*)
#Parses Valid URL characters:
([!#$&-.0-;=?-[\]_a-z~]|%[0-9a-fA-F]{2})+
but it doesn't work on HAProxy 1.4 :( Can someone give me some help here?
Thanks!
on haproxy version > 1.6