Well known issue: Google indexing engine states it can see 2x2x2=8 duplicates of the URLs, where differences are
- http - https
- www - no www
- root/ - root/index.php
(8 duplicates for the root URL, and 4 duplicates for every other page URL)
I use the following working code in the .htaccess to obtain 301-redirect for all of the duplicates:
RewriteEngine On
# first, www and root together
RewriteCond %{REQUEST_URI} ^/$
RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC]
RewriteRule ^(.*)$ https://%1/index.php [R=301,L]
# remove www
RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC]
RewriteRule ^(.*)$ https://%1/$1 [R=301,L]
# add index.php to the root url
RewriteCond %{REQUEST_URI} ^/$
RewriteRule ^(.*)$ https://%{HTTP_HOST}/index.php [R=301,L]
# finally, force https if none of the earlier conditions are met
RewriteCond %{HTTPS} off
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
The above code works well removing all duplicates with 301 redirect code. However, I believe it can be written a more elegant way, possibly without doubling rewrite condidtions/rules.
BTW I found hundreds(!) of posts giving advice and examples of the related .htaccess statements, and all of them are either incomplete or wrong! They usually stop after one condidtion is met, or don't result in 301 code in every cases.
Firstly (already discussed in comments), your canonical "root" URL should simply be
/, not/index.php. Users should not seeindex.phpin the URL and if the canonical is/index.phpthen you are going to have to always redirect users (and search engines) that type/request the root domain and share URLs including the extraindex.php- not good for anyone. Your canonical link element for the root is then simply:And all internal links/anchors should state
href="/", nothref="/index.php".Otherwise, the only "minor" issue I see in the rules you've posted is treating the root differently in the www to non-www redirect and having a separate rule for this. Although having separate rules is generally fine, providing the number of redirects is kept minimal, which they appear to be here.
Providing you are not implementing HSTS (in which case you would need to redirect to HTTPS first - on the same host - before other canonicalisation redirects) then you could combine these rules into one and minimise the number of redirects. For example:
I've kept the rules "generic" by not explicitly stating the domain name (but this then requires the 4th condition to get the hostname less the www subdomain). The rules could be "simplified" and would arguably be "more reliable" if the canonical hostname was hardcoded (depending on the server/environment).
The regex
^(.*?)(?:(^|/)index\.php)?$matches any URL, but excludes the optionalindex.php(or/index.php) on the end of the URL in the first capturing subpattern ($1). Note that this regex does not only handle/index.phpin the root, but also subdirectories, eg./foo/bar/index.php(which ultimately redirects to/foo/bar/). The$2backreference simply contains the trailing slash when removingindex.phpfrom a subdirectory, otherwise it is empty.