MongoDB Collection update: initialize a document with default values

4.7k views Asked by At

I am trying to deal with time series using MongoDB. The common solution adopted by community is to use subdocuments to store information at different level of granularity (see Schema Design for Time Series Data in MongoDB).

For example, take a look at the following document:

{
  timestamp_minute: ISODate("2013-10-10T23:06:00.000Z"),
  type: “memory_used”,
  values: [
    999999,   // 1 second
    …
    1000000,  // nth second
    1500000,  // n+1th second
    … 
    2000000   // 60th
  ]
}

The document is indexed by minute information and contains a subdocument which store more detailed information for each second.

So far so good. This kind of approach requires an optimization to work properly:

Another optimization [..] is preallocating all documents for the upcoming time period; This never causes an existing document to grow or be moved on disk.

To implement the above optimization one could use the $setOnInsert property on the update method.

db.getCollection('aCollection').update(
    {
      timestamp_minute: ISODate("2013-10-10T23:06:00.000Z"),
      type: “memory_used”
    },
    {
      $setOnInsert: { values: {'0': 0, '1': 0, '2': 0}},
      $inc: {"values.30": 1}
    },
    { upsert: true }
)

The problem is that it is not possible to use the same field in the same update in two different operation. The above update istruction generates the following error:

Cannot update 'values' and 'values.30' at the same time

This problem is tracked on this issue.

My question is: is there any workaround? I prefix that I can't use any batch that preallocates empty documents, because I can't know the value of the indexed fields a priori (in the above example, the values of the field type.

Thanks in advance.

1

There are 1 answers

2
riccardo.cardin On BEST ANSWER

Me and my colleagues found a workaround. We can call it three step initialization.

Remember that MongoDB guarantees the atomicity of operations on a single document. With this fact in mind we can operate in the following way:

  1. Try to update the document, incrementing properly the counters at a specified time chunk. Do not do any upsert, just a old-fashioned update operation. Remember that the execution of an update statement returns the number of documents written. If the number of documents written is greater than zero, you’re done.
  2. If the number of documents written by the update is zero, then it means that the relative document to update is not yet present in the collection. Try to insert the whole document for the specified tags. Put all the counters (field values) to zero. Also the execution of an insert statement returns the number of documents written. If it returns zero or throws an exception, never mind: it means that some other process had already insert the document for the same tags.
  3. Execute the same above update again.

The code should looks like something similar to the following code snippet.

// Firt of all, try the update
var result = db.test.update(
  {timestamp_minute: ISODate("2013-10-10T23:06:00.000Z"), type: “memory_used”},
  {$inc: {"values.39": 1}},
  {upsert: false}
);
// If the update do not succeed, then try to insert the document
if (result.nModified === 0) {
  try {
    db.test.insert(/* Put here the whole document */);
  } catch (err) {
    console.log(err);
  }
  // Here we are sure that the document exists.
  // Retry to execute the update statement
  db.test.update(/* Same update as above */);
}

The above procedure works if a precondition holds: _id value should be derived from other fields in the document. In our example, _id value would be '2013-10-10T23:06:00.000Z-memory_used. Only using this technique, the insert at point 2. will properly fail.