How to obtain a cookie from a remote domain using Greasemonkey?

2k views Asked by At

I'm writing a Greasemonkey (v2.3) script that basically screen-scrapes the contents served by lema.rae.es/drae/srv/search, for lack of an API of any sort.

The thing is, I want to query that URL from Google Translate, a different domain. For that, I can use GM_xmlhttpRequest without problems, but a GET request to a specific URL (for instance lema.rae.es/drae/srv/search?val=test) yields an HTML page with a hidden form that gets POSTed after calling the challenge() javascript function -- which calculates some sort of token which gets passed along in the POST request.

Obviously, this happens asynchronously and Greasemonkey sees nothing of it. Via trial and error I have come to realise that if my browser (Iceweasel 31.2.0) has a cookie for lema.drae.es, then the GET request issued using GM_xmlhttpRequest actually returns what I want, which is the HTML of the definition of the word passed as a the parameter "val" in the URL. However, if I delete all cookies for lema.drae.es, the GET request returns the aforementioned hidden form.

In short, I need a way to receive the response of that POST request from within Greasemonkey, and I believe that if it were possible to receive the cookie from the server and store it then I could include it as a request header in a further request and it should work as I expect. Or it should simply be stored in the browser and therefore would get sent as a header when I trigger GM_xmlhttpRequest.

I tried a different solution to my problem, namely using a hidden iframe, but the creation of such iframe was blocked by the browser on the grounds of the Same Origin policy, even after configuring the userscript to run on both domains.

Hopefully I've made clear what I want to achieve, and I hope somebody can point me in the right direction.

On a side note: if someone could explain what the challenge() function calculates, I would really appreciate it. My hypotheses would be that the token it generates gets sent to the server which in turn uses it to produce the cookie, but that sounds so overly complicated...

1

There are 1 answers

4
Brock Adams On BEST ANSWER

The hidden iframe route is the way to go, but it is being blocked by translate.google.com in this case.

Here is an alternate approach to ensure that Firefox has the fresh cookies it needs to keep your mashup site (lema.rae.es) happy:

  1. Find some source HTML that is present when the mashup site wants fresh cookies, but is absent otherwise.
    In this case, the JS source function challenge will do.

  2. Make the GM_xmlhttpRequest to the mashup site and test the response.

  3. If the GM_xmlhttpRequest response has the desired data, parse it as desired.
    Done!

  4. If the GM_xmlhttpRequest response has the "needs cookies" source, open a special query, of the mashup site, in a popup window:

    1. Since the site is opening in it's own window, it won't be blocked by cross-origin safeguards.
    2. Set the GM script to also operate on this special URL.
    3. For the special URL, wait until key nodes, or text, are/is present that indicate that the page has finished loading and cookies are set.
    4. Once the popup has finished, it sends a message to the opening page and then closes itself.
    5. When the opening page gets the message, it re-GM_xmlhttpRequests the mashup page.
    6. Parse it and Done!


Here is a complete and tested (on Firefox) Greasemonkey script that mashes lema.rae.es/drae/srv/search into translate.google.com. :

// ==UserScript==
// @name     _GM_xmlhttpRequest that needs cookie(s)
// @include  https://translate.google.com/*
// @include  http://lema.rae.es/drae/srv/search?val=openedByGM
// @require  http://ajax.googleapis.com/ajax/libs/jquery/2.1.0/jquery.min.js
// @require  https://gist.github.com/raw/2625891/waitForKeyElements.js
// @grant    GM_xmlhttpRequest
// ==/UserScript==

//--- Global variables
var mashupURL   = "http://lema.rae.es/drae/srv/search?val=test";
var cookGenURL  = "http://lema.rae.es/drae/srv/search?val=openedByGM";

if (location.href == cookGenURL) {
    //--- May be best we can do until Greasemonkey fixes tab handling flaws.
    document.title = "Close me!";

    if (window.opener) {
        waitForKeyElements ("i:contains(openedByGM)", closePopupIfCan);
    }
}
else {
    attemptMashup ();

    window.addEventListener ("message", receiveCookieMessage, false);
}

//-- Just functions from here on down...

function closePopupIfCan (jNode) {
    window.opener.postMessage ("Cookie(s) should be set!", "*");
    window.close ();
}

function attemptMashup () {
    GM_xmlhttpRequest ( {
        method:     "GET",
        url:        mashupURL,
        onload:     parseDictionaryResponse,
        onabort:    reportAJAX_Error,
        onerror:    reportAJAX_Error,
        ontimeout:  reportAJAX_Error
    } );
}

function receiveCookieMessage (event) {
    if (event.origin != "http://lema.rae.es")     return;

    console.log ("message ==> ", event.data);

    /*--- Now that have cookie(s), re-attempt mashup, but need a little
        settling time.
    */
    setTimeout (attemptMashup, 888);
}

function parseDictionaryResponse (respObject) {
    if (respObject.status != 200  &&  respObject.status != 304) {
        reportAJAX_Error (respObject);
        return;
    }
    /*--- If the required cookie is not present/valid, open the target page
        in a temporary tab to set the cookies, then reload this page.

        The test string is unique to the scraped site and is only present
        when cookie(s) is/are needed.
    */
    if (/function\s+challenge/i.test (respObject.responseText) ) {
        var newTab  = window.open (cookGenURL);
        return;
    }

    //--- Don't use jQuery to parse this!
    var parser      = new DOMParser ();
    var responseDoc = parser.parseFromString (
        respObject.responseText, "text/html" // Firefox only, for now.
    );

    //--- Get site-specific payload and put in site-specific location.
    var payload     = responseDoc.querySelectorAll ("body > div");
    $("#gt-form-c").before (payload);
}

function reportAJAX_Error (respObject) {
    alert (
        'Error ' + respObject.status + '!  "' + respObject.statusText + '"'
    );
}