Capturing Drupal7 DOM content before page load for comparison

155 views Asked by At

We have an MU (Multisite) installation of Drupal7 here at work, and are trying to temporarily hold back the swarm of bots we receive until we get a chance to load our content. I wrote a quick and and dirty script to send 503 headers if we find a certain criteria in Xpath (This can ALSO be done as a strpos/preg_match if DOM is not formed).

In order to get the ball rolling though I need to figure out how to either

A) Hijack the Drupal7 bootstrap and pull all content through this filter below

B) ob_flush content through the filter before content is loaded

WORTH MENTIONING We use a Module that is called Domain Access, which I believe has led me on this crazy chase in the first place. I know for a fact that it muddles with quite a few files...

The issue that I am having is figuring out exactly where I can catch the content at? It should be possible to push the stream into a variable, strpos it, then release it, correct? I thought that index.php in Drupal7 would be the suspect, but I'm a little confused as to where or how I should capture the contents. Here's the script, and hopefully someone can point me in the right direction.

//error_reporting(-1);

    /* start query */

    $dom = new DOMDocument;
    $dom->preserveWhiteSpace = false;
    $dom->Load($_SERVER['PHP_SELF']);

    $xpath = new DOMXPath($dom);

        //if this exists we aren't ready to be read by bots
        $query = $xpath->query(".//*[@id='block-views-about-this-site-block']/div/div/div");
        //or $query = 'klat-badge'; //if this is a string not DOM

    /* end query */

if(strpos($query) !== false) { 

    //require banlist
    require('botlist.php'); 

    $str = strtolower('/'.implode('|', array_unique($list)).'/i'); 
    if(preg_match($str, strtolower($_SERVER['HTTP_USER_AGENT']))) {
        //so tell bots we're broken
        header('HTTP/1.1 503 Service Temporarily Unavailable');
        header('Status: 503 Service Temporarily Unavailable');
        exit;
    }
}
1

There are 1 answers

2
Clive On BEST ANSWER

It would be a lot easier to just define a constant in a module and check that instead. You could then use hook_init() to make a decision on whether the page is ready before the content is even built:

define('IN_DEVELOPMENT', TRUE);

function mymodule_init() {
  if (IN_DEVELOPMENT) {
    //require banlist
    require('botlist.php'); 

    $str = strtolower('/'.implode('|', array_unique($list)).'/i'); 
    if(preg_match($str, strtolower($_SERVER['HTTP_USER_AGENT']))) {
      //so tell bots we're broken
      header('HTTP/1.1 503 Service Temporarily Unavailable');
      header('Status: 503 Service Temporarily Unavailable');
      exit;
    }
  }
}

There might be a way to do what you want by loading the whole page content into a DOMDocument but it wont be easy in Drupal (as I'm sure you've already discovered!) and certainly not efficient.

Hope that helps