Why does MongoDB not support queries of properties of embedded documents that are stored in hashed arrays?

973 views Asked by At

Why does MongoDB not support queries of properties of embedded documents that are stored using hashes?

For example say you have a collection called "invoices" which was created like this:

db.invoices.insert(
    [
        {
            productsBySku: {
                12432: {
                    price: 49.99,
                    qty_in_stock: 4
                },
                54352: {
                    price: 29.99,
                    qty_in_stock: 5
                }
            }
        },
        {
            productsBySku: {
                42432: {
                    price: 69.99,
                    qty_in_stock: 0
                },
                53352: {
                    price: 19.99,
                    qty_in_stock: 5
                }
            }
        }
    ]
);

With such a structure, MongoDB queries with $elemMatch, dot syntax, or the positional operator ($) fail to access any of the properties of each productsBySku member.

For example you can't do any of these:

db.invoices.find({"productsBySku.qty_in_stock":0});

db.invoices.find({"productsBySku.$.qty_in_stock":0});

db.invoices.find({"productsBySku.qty_in_stock":{$elemMatch:{$eq:0}}});

db.invoices.find({"productsBySku.$.qty_in_stock":{$elemMatch:{$eq:0}}});

To find out-of-stock products therefore you have to resort to using a $where query like:

db.invoices.find({
    $where: function () {
        for (var i in this.productsBySku)
            if (!this.productsBySku[i].qty_in_stock)
                return this;
    }
});

On a technical level... why did they design MongoDB with this very severe limitation on queries? Surely there must be some kind of technical reason for this seeming major flaw. Is this inability to deal with an a list of objects as an array, ignoring the keys, just a limitation of JavaScript as a language? Or was this the result of some architectural decision within MongoDB?

Just curious.

2

There are 2 answers

2
Sammaye On BEST ANSWER

If I am honest I am not sure, you would have to ask MongoDB Inc. (10gen) themselves. I will attempt to explain some of my reasoning.

I have searched on Google a little and nothing seems to appear: https://www.google.co.uk/search?q=mognodb+jira+support+querying+objects&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-GB:official&client=firefox-a&channel=fflb&gfe_rd=cr&ei=as9pVOW3OMyq8wfhtYCgCw#rls=org.mozilla:en-GB:official&channel=fflb&q=mongodb+jira+querying+objects

It is quick to see how using objectual propeties for keys could be advantageous, for example: remove queries would not have to search every object and its properties within the array but instead just find the single object property in the parent object and unset it. Essentially it would be the difference of:

[
    {id:1, d:3, e:54},
    {id:3, t:6, b:56}
]

and:

{
    1: [d:3, e: 54],
    3: [t:6, b:56]
}

with the latter, obviously, being a lot quicker to delete an id of 3.

Not only that but all array operations that MongoDB introduces, from $elemMatch to $unwind would work wth objects as well, I mean how is unwinding:

[
    {id:5, d:4}
]

much different to unwinding:

{
    5: {d:4}
}

?

So, if I am honest, I cannot answer your question. There is no defense on Google as to their decision and there is no extensive talk from what I can find.

In fact I went as far as to search up on this a couple of times, including: https://www.google.co.uk/search?q=array+operations+that+do+not+work+on+objects&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-GB:official&client=firefox-a&channel=fflb&gfe_rd=cr&ei=DtNpVLrwDsPo7AaH4oCoDw and I found results that went as far as underscore.js who actually comply their array functions to all objects as well.

The only real reason, I can think of, is standardisation. Instead of catering to all minority groups etc on how subdocuments may work they just cater to a single minority turned majority by their choice.

It is one of the points about MongoDB which does confuse me even now, since there are many times within my own programming where it seems advantageous for speed and power to actually use objects instead of arrays.

5
Markus W Mahlberg On

As a rule of thumb: Usually, these problems aren't technical ones, but problems with data modeling. I have yet to find a use case where it makes sense to have keys hold semantic value.

If you had something like

'products':[
     {sku:12432,price:49.99,qty_in_stock:4},
     {sku:54352,price:29.99,qty_in_stock:5}
 ]

It would make a lot more sense.

But: you are modelling invoices. An invoice should – for many reasons – reflect a status at a certain point in time. The ever changing stock rarely belongs to an invoice. So here is how I would model the data for items and invoices

{
  '_id':'12432',
  'name':'SuperFoo',
  'description':'Without SuperFoo, you can't bar or baz!',
  'current_price':49.99
 }

Same with the other items.

Now, the invoice would look quite simple:

{ _id:"Invoice2",
  customerId:"987654"
  date:ISODate("2014-07-07T12:42:00Z"),
  delivery_address:"Foo Blvd 42, Appt 42, 424242 Bar, BAZ"
  items:
    [{id:'12432', qty: 2, price: 49.99},
     {id:'54352', qty: 1, price: 29.99}
    ]
}

Now the invoice would hold things that may only be valid at a given point in time (prices and delivery address may change) and both your stock and the invoices are queried easily:

// How many items of 12432 are in stock?
db.products.find({_id:'12432'},{qty_in_stock:1})

// How many items of 12432 were sold during July and what was the average price?
db.invoices.aggregate([
  {$unwind:"$items"},
  {
    $match:{
      "items.id":"12432",
      "date":{
         $gt:ISODate("2014-07-01T00:00:00Z"),
         $lt:ISODate("2014-08-01T00:00:00Z")
      }
     }
   },
   {$group : { _id:"$items.id", count: { $sum:"$items.qty" }, avg:{$avg:"$items.price"} } }
])

// How many items of each product sold did I sell yesterday?
db.invoices.aggregate([
  {$match:{ date:{$gte:ISODate("2014-11-16T00:00:00Z"),$lt:ISODate("2014-11-17T00:00:00Z")}}},
  {$unwind:"$items"},
  {$group: { _id:"$items.id",count:{$sum:"$qty"}}}
])

Combined with the query on how many items of each product you have in stock, you can find out wether you have to order something (you have to do that calculation in your code, there is no easy way to do this in MongoDB).

You see, with a "small" change, you get a lot of questions answered.

And that's basically how it works. With relational data, you model your data so that the entities are reflected properly and then you ask

How do I get my answers out of this data?

In NoSQL in general and especially with MongoDB you first ask

Which questions do I need to get answered?

and model your data accordingly. A subtle, but important difference.