Filtering out array items in a CosmosDB query with best performance

844 views Asked by At

In CosmosDB I am able to select documents where items in an array have a given value using ARRAY_CONTAINS. For example:

SELECT * FROM d WHERE ARRAY_CONTAINS(d.Assignments, {'Owner':'Jason'}, true)

In the above query I get the following returned:

[
    {
        "id": "0",
        "Assignments": [
            {
                "Fruit": "Apple",
                "Owner": "Jason"
            },
            {
                "Fruit": "Orange",
                "Owner": "Jason"
            },
            {
                "Fruit": "Pear",
                "Owner": "Amy"
            }
        ]
    },
    {
        "id": "1",
        "Assignments": [
            {
                "Fruit": "Pear",
                "Owner": "Liz"
            },
            {
                "Fruit": "Grape",
                "Owner": "Jason"
            }
        ]
    },
    {
        "id": "2",
        "Assignments": [
            {
                "Fruit": "Grape",
                "Owner": "Liz"
            },
            {
                "Fruit": "Grape",
                "Owner": "Jason"
            }
        ]
    }
]

However I would also like the returned JSON to have all array items that do not match my query to be filtered out. For example:

[
    {
        "id": "0",
        "Assignments": [
            {
                "Fruit": "Apple",
                "Owner": "Jason"
            },
            {
                "Fruit": "Orange",
                "Owner": "Jason"
            }
        ]
    },
    {
        "id": "1",
        "Assignments": [
            {
                "Fruit": "Grape",
                "Owner": "Jason"
            }
        ]
    },
    {
        "id": "2",
        "Assignments": [
            {
                "Fruit": "Grape",
                "Owner": "Jason"
            }
        ]
    }
]

I would prefer to find a way to do this in my query assuming I can do so with good performance and relatively low Request Units.

Is it more advisable to filter out the results in code once the JSON is returned?

In some scenarios I may have a few hundred array items with about 60-80% needing to be filtered out.

1

There are 1 answers

5
Harshita Singh On BEST ANSWER

I added 3 more similar documents for these records. You can use below query to fulfill this requirement in an optimum way:

SELECT f.id, ARRAY(SELECT * FROM c in f.Assignments WHERE c.Owner = 'Jason') AS Assignments FROM f WHERE ARRAY_CONTAINS(f.Assignments, {'Owner':'Jason'}, true)

Results:

[
    {
        "id": "0",
        "Assignments": [
            {
                "Fruit": "Apple",
                "Owner": "Jason"
            },
            {
                "Fruit": "Orange",
                "Owner": "Jason"
            }
        ]
    },
    {
        "id": "1",
        "Assignments": [
            {
                "Fruit": "Grape",
                "Owner": "Jason"
            }
        ]
    },
    {
        "id": "2",
        "Assignments": [
            {
                "Fruit": "Grape",
                "Owner": "Jason"
            }
        ]
    },
    {
        "id": "3",
        "Assignments": [
            {
                "Fruit": "Grape",
                "Owner": "Jason"
            }
        ]
    },
    {
        "id": "4",
        "Assignments": [
            {
                "Fruit": "Grape",
                "Owner": "Jason"
            }
        ]
    },
    {
        "id": "5",
        "Assignments": [
            {
                "Fruit": "Grape",
                "Owner": "Jason"
            }
        ]
    }
]

Query Stats:

enter image description here