here's my problem:
I have a collection with ~900k entries, it's about 6Gb,this will be updated daily, at least for a while. By updated I mean, dumped and imported from a json file, which we'll get every day (this may be just temp solution).
The problem that I'm dealing with at the moment is keeping a history on 4,5 fields in a separate collection, that will be updated before/after each "update" of the main collection.
What I did, was to get all the records and loop through the results updating history collection for each entry, but running ~900k update queries is time/resource consuming and I don't think is a good practice.
The structure that I have in mind is something like this:
{unique_id: 'unique value from main collection',
history: [
{field1: value1,
field2: value2,
field3: 'value3',
date_updated: '2013-12-03'
},
{...}
]
}
I tried mapreduce too, with output to collection, which was a lot faster, the problem with mapreduce is that I can't really keep the history because none of the options for "action" param doesn't do what I need, each time it runs, it overwrites the entries, what I need it to do is something like an upsert.
If anyone has some ideas about how can I do this I'll gladly put them to test, I ran out of ideas, and I'm not a guru in mongodb either.
I think the import / update solution is not good either, I'm looking for a solution to that too, but first to get the history done.