Find and change all date type fields in mongodb collection

2.2k views Asked by At

I have a collection with multiple date type fields. I know I can change them based on their key, but is there a way to find all fields that have date as a type and change all of them in one script?

UPDATE

Many thanks to chridam for helping me out. Based upon his code I came up with this solution. (Note: I have mongo 3.2.9, and some code snippets from chridam's answer just wouldn't run. It might be valid but it didn't work for me.)

map = function() {
    for (var key in this) { 
        if (key != null && this[key] != null && this[key] instanceof Date){ 
            emit(key, null); 
        }
    }
}

collectionName = "testcollection_copy";

mr = db.runCommand({
    "mapreduce": collectionName,
    "map": map,  
    "reduce": function() {},
    "out": "map_reduce_test" // out is required
}) 

dateFields = db[mr.result].distinct("_id")
printjson(dateFields)

//updating documents
db[collectionName].find().forEach(function (document){
   for(var i=0;i<dateFields.length;i++){
       document[dateFields[i]] = new NumberLong(document[dateFields[i]].getTime());
   } 
   db[collectionName].save(document);
});

Since projection didn't work, I used the above code for updating the documents. My only question is why to use bulkWrite?

(Also, getTime() seemed better than substracting dates.)

1

There are 1 answers

4
chridam On BEST ANSWER

An operation like this would involve two tasks; one to get a list of fields with the date type via MapReduce and the next to update the collection via aggregation or Bulk write operations.

NB: The following methodology assumes all the date fields are at the root level of the document and not embedded nor subdocuments.

MapReduce

The first thing you need is to run the following mapReduce operation. This will help you determine if each property with every document in the collection is of date type and returns a distinct list of the date fields:

// define helper function to determine if a key is of Date type
isDate = function(dt) {
    return dt && dt instanceof Date && !isNaN(dt.valueOf());
}

// map function
map = function() {
    for (var key in this) { 
        if (isDate(value[key]) 
            emit(key, null); 
    }
}

// variable with collection name
collectionName = "yourCollectionName";

mr = db.runCommand({
    "mapreduce": collectionName,
    "map": map,  
    "reduce": function() {}
}) 

dateFields = db[mr.result].distinct("_id")
printjson(dateFields)

//output: [ "validFrom", "validTo", "registerDate"" ]

Option 1: Update collection via aggregation framework

You can use the aggregation framework to update your collection, in particular the $addFields operator available in MongoDB version 3.4 and newer. If your MongoDB server version does not support this, you can update your collection with the other workaround (as described in the next option).

The timestamp is calculated by using the $subtract arithmetic aggregation operator with the date field as minuend and the date since epoch new Date("1970-01-01") as subtrahend.

The resulting documents of the aggregation pipeline are then written to the same collection via the $out operator thus updating the collection with the new fields.

In essence, you'd want to end up running the following aggregation pipeline which converts the date fields to timestamps using the above algorithm:

pipeline = [
    {
        "$addFields": {
            "validFrom": { "$subtract": [ "$validFrom", new Date("1970-01-01") ] },
            "validTo": { "$subtract": [ "$validTo", new Date("1970-01-01") ] },
            "registerDate": { "$subtract": [ "$registerDate", new Date("1970-01-01") ] }
        }
    },
    { "$out": collectionName }
]
db[collectionName].aggregate(pipeline)

You can dynamically create the above pipeline array given the list of the date fields as follows:

var addFields = { "$addFields": { } },
    output = { "$out": collectionName };

dateFields.forEach(function(key){
    var subtr = ["$"+key, new Date("1970-01-01")];
    addFields["$addFields"][key] = { "$subtract": subtr };
});

db[collectionName].aggregate([addFields, output])

Option 2: Update collection via Bulk

Since this option is a workaround when $addFields operator from above is not supported, you can use the $project pipeline to create the new timestamp fields with the same $subtract implementation but instead of writing the results to the same collection, you can iterate the cursor from the aggregate results using forEach() method and with each document, update the collection using the bulkWrite() method.

The following example shows this approach:

ops = []
pipeline = [
    {
        "$project": {
            "validFrom": { "$subtract": [ "$validFrom", new Date("1970-01-01") ] },
            "validTo": { "$subtract": [ "$validTo", new Date("1970-01-01") ] },
            "registerDate": { "$subtract": [ "$registerDate", new Date("1970-01-01") ] }
        }
    }
]

db[collectionName].aggregate(pipeline).forEach(function(doc) {
    ops.push({
        "updateOne": {
            "filter": { "_id": doc._id },
            "update": {
                "$set": { 
                    "validFrom": doc.validFrom,
                    "validTo": doc.validTo,
                    "registerDate": doc.registerDate
                }
            }
        }
    });

    if (ops.length === 500 ) {
        db[collectionName].bulkWrite(ops);
        ops = [];
    }
})

if (ops.length > 0)  
    db[collectionName].bulkWrite(ops);

Using the same method as Option 1 above to create the pipeline and the bulk method objects dynamically:

var ops = [],
    project = { "$project": { } },

dateFields.forEach(function(key){
    var subtr = ["$"+key, new Date("1970-01-01")];
    project["$project"][key] = { "$subtract": subtr };
});

setDocFields = function(doc, keysList) { 
    setObj = { "$set": { } };
    return keysList.reduce(function(obj, key) {  
        obj["$set"][key] = doc[key];
        return obj;
    }, setObj )
}

db[collectionName].aggregate([project]).forEach(function(doc) {
    ops.push({
        "updateOne": {
            "filter": { "_id": doc._id },
            "update": setDocFields(doc, dateFields)
        }
    });

    if (ops.length === 500 ) {
        db[collectionName].bulkWrite(ops);
        ops = [];
    }
})

if (ops.length > 0)  
    db[collectionName].bulkWrite(ops);