passing all collections of a mongoDB as input to mapreduce hadoop

258 views Asked by At

I need to pass all collections of my database in MongoDB as input to Hadoop MR job. There is a method that allows multiple input:

MultiCollectionSplitBuilder mcsb = new MultiCollectionSplitBuilder();
mcsb.add(new MongoURI("mongodb://localhost:27017/mongo_hadoop.yield_historical.in"),
        (MongoURI)null, // authuri
        true, // notimeout
        (DBObject)null, // fields
        (DBObject)null, // sort
        (DBObject)null, // query
        false,
        MultiMongoCollectionSplitter.class)
.add(new MongoURI("mongodb://localhost:27017/mongo_hadoop.yield_historical.in"),
        (MongoURI)null, // authuri
        true, // notimeout
        (DBObject)null, // fields
        (DBObject)null, // sort
        new BasicDBObject("_id", new BasicDBObject("$gt", new Date(883440000000L))),
        false, // range query
        MultiMongoCollectionSplitter.class);

But I have arount 10 collections in my db. The above method allows only 2 collection arguements. All I need to do is get all collections in mapper methos alone. My Reducer will be the same for all of them.

Any help is appreciated.

1

There are 1 answers

1
Alan Spencer On

You can continue to add to the MultiCollectionSplitBuilder

    MultiCollectionSplitBuilder mcsb = new MultiCollectionSplitBuilder();
    mcsb
            .add(new MongoURI("mongodb://localhost:27017/mongo_hadoop.yield_historical.in"),
                    (MongoURI) null, // authuri
                    true, // notimeout
                    (DBObject) null, // fields
                    (DBObject) null, // sort
                    (DBObject) null, // query
                    false,
                    MultiMongoCollectionSplitter.class
            )
            .add(new MongoURI("mongodb://localhost:27017/mongo_hadoop.yield_historical.in"),
                    (MongoURI) null, // authuri
                    true, // notimeout
                    (DBObject) null, // fields
                    (DBObject) null, // sort
                    new BasicDBObject("_id", new BasicDBObject("$gt", new Date(883440000000L))),
                    false, // range query
                    MultiMongoCollectionSplitter.class
            )
            .add(new MongoURI("mongodb://localhost:27017/mongo_hadoop.yield_historical.in"),
                    (MongoURI) null, // authuri
                    true, // notimeout
                    (DBObject) null, // fields
                    (DBObject) null, // sort
                    new BasicDBObject("_id", new BasicDBObject("$gt", new Date(883440000000L))),
                    false, // range query
                    MultiMongoCollectionSplitter.class
            )
    ;