MongoDB multikey index on array of embedded document - sort/range comparisons and min/maxkey

380 views Asked by At

Simplified example and question

In effort to be as simple as possible, I'll just jump into an example:

Collection foo with multikey index {searchTags: 1}:

{_id: 1, searchTags: [{bar: "BAR"}]}
{_id: 2, searchTags: [{baz: "BAZ"}]}

I'm trying to get all embedded documents with the baz field key (without using $exists - I can try to explain why later). Why does

{searchTags: {$elemMatch: { $gte: { baz: MinKey() }, $lte: { baz: MaxKey() }}}}

return BOTH documents (not preferred), but

{searchTags: {$elemMatch: { $gte: { baz: "" }, $lte: { baz: MaxKey() }}}}

only returns {_id: 2, searchTags: [{baz: "BAZ"}]} (preferred)?

Sidenote: quick schema details to maybe avoid the "what are you doing?" questions

  • These are simple embedded documents that only have 1 key-value pair.
  • The values can be multiple types, not just strings (otherwise, I would have just used an array of prefixed strings instead of embedded documents)
  • Being that the embedded docs are singular key-value pairs, I can deterministically use equality and comparison queries on the embedded docs.
  • The keys are other field names on foo documents, and the values are their field values. I could index each of the various fields, but they would be partial indexes and there would be a lot of them. For flexibility, merging them all into an array with a multikey index makes the most sense.
  • Still open to hear why I'm wrong :)
1

There are 1 answers

8
Joe On
  • The index on {searchTags: 1} is not being used to service that query

When indexing an array, each element of the array is included as a value in the index. Any element that happens to be a document is indexes as an entire document, it is not broken down by field. I expect if you were to run that command using explain, it would show it is a collection scan.

  • MinKey is less than any other possible key value, including null, missing, and undefined. Likewise for MaxKey

A quick demonstration:

> db.collection.insertMany([
 {_id: 1, searchTags: [{bar: "BAR"}]},
 {_id: 2, searchTags: [{baz: "BAZ"}]} 
])
{ "acknowledged" : true, "insertedIds" : [ 1, 2 ] }

> db.collection.aggregate([
  {$unwind: "$searchTags"},
  {$addFields: {
      baztype: {$type: "$searchTags.baz"},
      bazmin: {$gt: ["$searchTags.baz", MinKey()]},
      bazmax: {$lt: ["$searchTages.baz", MaxKey()]},
      bazstr: {$gt: ["$searchTags.baz", ""]}
   }}
])
{ "_id" : 1, "searchTags" : { "bar" : "BAR" }, "baztype" : "missing", "bazmin" : true, "bazmax" : true, "bazstr" : false }
{ "_id" : 2, "searchTags" : { "baz" : "BAZ" }, "baztype" : "string", "bazmin" : true, "bazmax" : true, "bazstr" : true }

The first query you show matches because every possible value, including undefined, is both greater than MinKey and less than MaxKey.

The second query you show doesn't match because the operators are case-sensitive, so when a string value is provided, only string values will match, and this does not include undefined.

  • To find documents that contain a field, regardless of type, use the $exists operator:
db.collection.find({"searchTags.baz":{$exists:true}})