.htaccess - Configuring an Apache server for Memento

155 views Asked by At

I try to configure an Apache server to add an HTTP Link header pointing at a Memento TimeGates url

My htaccess:

RewriteEngine on

RewriteCond %{IS_SUBREQ} false
RewriteRule ^/(.*) - [E=ORIGURI:%{HTTP_HOST}/$1]

RewriteRule ^/(.*) - [E=ORIGQRY:]
RewriteCond %{QUERY_STRING} .+
RewriteRule ^/(.*) - [E=ORIGQRY:?%{QUERY_STRING}]

RewriteRule ^/(.*) - [E=ORIGPROTO:http]
RewriteCond %{HTTPS} on
RewriteRule ^/(.*) - [E=ORIGPROTO:https]

Header always set Link 
    "<http://purl.org/memento/timegate/%{ORIGPROTO}e://%{ORIGURI}e%{ORIGQRY}e>;rel=timegate"

.
.
.

unchanged from: http://www.mementoweb.org/tools/apache/

I am testing the code on XAMPP server but the response sent by the server is:

.
.
.
Link    <http://purl.org/memento/timegate/(null)://(null)(null)>;rel=timegate
Server  Apache/2.4.3 (Win32) OpenSSL/1.0.1c PHP/5.4.7
.
.
.

What is wrong with the htaccess?

EDIT 1

Removing the leading slashes as suggested by Jon Lin :

RewriteEngine On
RewriteCond %{IS_SUBREQ} FALSE
RewriteRule ^(.*) - [E=ORIGURI:$1]
RewriteRule ^(.*) - [E=ORIGQRY:]
RewriteCond %{QUERY_STRING} .+
RewriteRule ^(.*) - [E=ORIGQRY:?%{QUERY_STRING}]
RewriteRule ^(.*) - [E=ORIGPROTO:http]
RewriteCond %{HTTPS} on
RewriteRule ^(.*) - [E=ORIGPROTO:https]
Header always set Link "<http://purl.org/memento/timegate/%{ORIGPROTO}e://%{ORIGURI}e%{ORIGQRY}e>;rel=timegate"

The new response sent by the server:

Link <http://purl.org/memento/timegate/http://(null)>;rel=timegate

As we can see the protocole is parsed but not the rest of the url, any other suggestion?

2

There are 2 answers

0
RafaSashi On BEST ANSWER

Configuring an Apache server for Memento

1. HTTP Headers Using .htaccess

RewriteEngine On

RewriteRule ^(.*) - [E=ORIGPROTO:http]
RewriteCond %{HTTPS} on
RewriteRule ^(.*) - [E=ORIGPROTO:https]
RewriteRule ^(.*) - [E=ORIGURI:%{HTTP_HOST}]
RewriteCond %{THE_REQUEST} \s/+([^\s?]+)
RewriteRule ^ - [E=ORIGQRY:%1]

Header always set Link "<http://purl.org/memento/timegate/%{ORIGPROTO}e://%{ORIGURI}e%{ORIGQRY}e>;rel=timegate"

See: htaccess - how to capture the current rewrited url?

2. HTTP Headers Using PHP

   function get_canonical_url($proto='http://'){
        $canonical_url = $proto;
        if($_SERVER["SERVER_PORT"] != "80") {
                $canonical_url .= $_SERVER["SERVER_NAME"] . ":" . $_SERVER["SERVER_PORT"] .  $_SERVER["REQUEST_URI"];
        } 
        else {
                $canonical_url.=$_SERVER["SERVER_NAME"].$_SERVER["REQUEST_URI"];
        }
        return $canonical_url;
    }

    header('Link: <'.get_canonical_url().'>; rel="canonical"');

Resources: http://moz.com/blog/how-to-advanced-relcanonical-http-headers

1
Jon Lin On

Your rules have this pattern ^/(.*) which will never match if the rules are in an htaccess file. Mod_rewrite strips off the leading slash from the URI when applying rules in an htaccess file. Remove the leading slashes in your patterns:

RewriteRule ^(.*) - [E=ORIGURI:%{HTTP_HOST}/$1]

etc...