mongodb mapreduce does incorrect reducing

79 views Asked by At

I'm running into some trouble with a very simple mapreduce, I can't figure out what I've done wrong. I'm trying to merge two collections together, and this first, db.Pos looks like this

"chr" : "chr1", "begin" : 39401, "end" : 39442

The other collection, db.Gene has the following format

"chr" : "chr1", "begin" : 39401, "end" : 39442, "gene" : "GENE1"

My code looks like this:

var mapPos = function(){

    emit({chr: this.chr, begin:this.begin, end:this.end},{gene:""});

}

var mapGene = function() {

    emit({chr: this.chr, begin:this.begin, end:this.end},{gene:this.gene});
}

r = function(key,values){

    var result = {gene:""}
    values.forEach(function(value){

    result.gene = value.gene;

});

return result;

}

res = db.Pos.mapReduce(mapPos, r, {out: {reduce: 'joined'}});
res = db.Gene.mapReduce(mapGene, r, {out: {reduce: 'joined'}});

So what I'd like to see is a collection where entries that are matching by chr, begin, and end are merged and the gene field is filled in from the db.Gene collection.

Instead, I'm getting the "gene" field in my "joined" collection updated to something other than 0 even when there is no matching doc in db.Gene that has a gene field.

What did I do wrong?

1

There are 1 answers

0
Anthonny On

After reflexion, i think you should use merge and not reduce for your out.


The reason why you don't have the good value :

The problem is when the reduce is applied between the joined collection content and the result of the db.Gene.mapReduce.

The function reduce don't know which value is the newest, so the result.gene returned is the last value.gene of the values array.

To distinguish the value that will override the value existing in the collection, you can add a flag.

res = db.Pos.mapReduce(
    function() {
        emit({chr: this.chr, begin:this.begin, end:this.end},{gene:this.gene || ''});
    }, 
    function(key,values){
        var result = {};
        values.forEach(function(value){
            if (value)
                result.gene = value.gene;
        });
    }, 
    {out: {reduce: 'joined'}}
);

res = db.Gene.mapReduce(
    function() {
        //Add a flag override here
        emit({chr: this.chr, begin:this.begin, end:this.end},{gene:this.gene || '', override: true});
    }, 
    function(key,values){
        var result = {};
        values.forEach(function(value){
            if (value.override)
                result.gene = value.gene;
        });
        return result;
    }, 
    {out: {reduce: 'joined'}}
);

Hope it's clear :)