How to proxy an external site in Express

1.8k views Asked by At

I am trying to make a proxy server that loads external websites under my domain. My goal with this is to let people visit myDomain.com/anyDomain.com and be able to use anyDomain.com with added functionality (injected JS).

I tried using the Request package to get the html of the site and then send it to the response in Express, but this approach messes up the site (relative paths, missing CSS, bad JS requests, etc).

Is there any node package that accomplishes this task? If not, how can I do it myself?

Thank you!

2

There are 2 answers

0
undefined On

This is probably not legal, so disclaimer: DONT USE THIS CODE.

The following is a very hacky example of how you could do this by using cookies to track the proxied host for any relative urls.

Basically anytime the url path matches /*.com/* we set run regex on it and set a cookie proxy_host to just whatever matches *.com. If the url path does not match that, we check if the cookie proxy_host has been set. If it does, we tack the url path onto the cookie proxy_host and proxy that url.

var app = require('express')();
var request = require('request');
var cookieParser = require('cookie-parser');

var HOST_RE = /([^/]+\.com)/;

app.use(cookieParser());

app.use(function(req, res) {
  // Check if the url matches *.domain/www.somehost.com/*
  if (HOST_RE.test(req.path)) {
    // get a match for the host only, no paths
    var proxyHost = HOST_RE.exec(req.path)[0];
    // clean the path of the host, so that we can proxy the exact
    // page the user requested
    var path = req.path.replace(proxyHost, '');

    // We have to cache the body in this instance because before we
    // send the proxied response, we need to set our cookie `proxy_host`
    var body = '';
    return request.get('http://' + proxyHost + path)
      .on('data', function(data) {
        body += data;
      })
      .on('end', function() {
        res.cookie('proxy_host', proxyHost);
        res.end(body);
      });
  }

  // Url does not match *.domain/www.somehost.com/*
  // most likely a relative url. If we have the `proxy_host`
  // cookie set, just proxy `http://` + proxy_host + `req.path`
  if (req.cookies && req.cookies.proxy_host) {
    return request.get('http://' + req.cookies.proxy_host + req.path).pipe(res);
  }

  // otherwise 404
  res.status(404).end();
});

app.listen(8000);
3
Morten Olsen On

This should get you started, it allows you to call fx http://localhost:6008/www.example.com/hello/world?foo=bar which would then proxy http://www.example.com/hello/world?foo=bar but if you are going to proxy other webpages you are going to hit all sorts of problems.

First of the fact that it might not be legal. I am not aware of the rules round proxying pages and altering them, you should check the laws regarding your specific use-case.

Secondly, since a lot of content on webpages are using absolute urls (especially if the content uses multiple domains for things like CDN and APIs), these resources are still going to point back to the original destination, which might very well cause quite a few headaches

var express = require('express'),
    http = require('http'),
    url = require('url'),
    app = express();

app.get('/:host*', function (request, response, next) {

    var proxyurl = url.parse(request.url);
    var path = request.params[0];
    if (!!proxyurl.search) {
        path += proxyurl.search;
    }

    http.get({
        host: request.params.host,
        path: path,
        headers: {}
    }, function(res) {
        var body = '';

        res.on('data', function(chunk) {
            body += chunk;
        });

        res.on('end', function() {
            response.end(body);
        });
    }).on('error', function(e) {
        console.log("Got error: ", e);
    });
});

app.listen(6008);