I am trying to deal with time series using MongoDB. The common solution adopted by community is to use subdocuments to store information at different level of granularity (see Schema Design for Time Series Data in MongoDB).
For example, take a look at the following document:
{
timestamp_minute: ISODate("2013-10-10T23:06:00.000Z"),
type: “memory_used”,
values: [
999999, // 1 second
…
1000000, // nth second
1500000, // n+1th second
…
2000000 // 60th
]
}
The document is indexed by minute information and contains a subdocument which store more detailed information for each second.
So far so good. This kind of approach requires an optimization to work properly:
Another optimization [..] is preallocating all documents for the upcoming time period; This never causes an existing document to grow or be moved on disk.
To implement the above optimization one could use the $setOnInsert
property on the update
method.
db.getCollection('aCollection').update(
{
timestamp_minute: ISODate("2013-10-10T23:06:00.000Z"),
type: “memory_used”
},
{
$setOnInsert: { values: {'0': 0, '1': 0, '2': 0}},
$inc: {"values.30": 1}
},
{ upsert: true }
)
The problem is that it is not possible to use the same field in the same update in two different operation. The above update istruction generates the following error:
Cannot update 'values' and 'values.30' at the same time
This problem is tracked on this issue.
My question is: is there any workaround? I prefix that I can't use any batch that preallocates empty documents, because I can't know the value of the indexed fields a priori (in the above example, the values of the field type
.
Thanks in advance.
Me and my colleagues found a workaround. We can call it three step initialization.
Remember that MongoDB guarantees the atomicity of operations on a single document. With this fact in mind we can operate in the following way:
The code should looks like something similar to the following code snippet.
The above procedure works if a precondition holds:
_id
value should be derived from other fields in the document. In our example,_id
value would be'2013-10-10T23:06:00.000Z-memory_used
. Only using this technique, the insert at point 2. will properly fail.