Finding total sum by aggregating Redis JSON

361 views Asked by At

I want to find total website visits by summing up the visits of individual pages.

In my Redis, I have JSONs like:

{'page_name': 'home', 'visit_count' : 10}
{'page_name': 'add_to_cart', 'visit_count' : 7}
{'page_name': 'checkout', 'visit_count' : 5}

I want to run an aggregation logic using FT.AGGREATE so that I can get, total_visits = 10 + 7 + 5 = 22

I can add necessary indexing, if required.

1

There are 1 answers

1
A. Guy On BEST ANSWER

You can use FT.AGGREGATE for that. You will have to index at least one field for that.

Here is an example using your data.

Add Data

127.0.0.1:6379> JSON.SET doc:1 $ '{"page_name": "home", "visit_count" : 10}' 
OK
127.0.0.1:6379> JSON.SET doc:2 $ '{"page_name": "add_to_cart", "visit_count" : 7}' 
OK
127.0.0.1:6379> JSON.SET doc:3 $ '{"page_name": "checkout", "visit_count" : 5}' 
OK

Create an index

At least one field needs to be indexed. Here are few possible options:

  1. Indexing visit_count:

    FT.CREATE idx ON JSON PREFIX 1 doc: SCHEMA $.visit_count AS visit_count SORTABLE 
    
  2. Indexing page_name

    FT.CREATE idx ON JSON PREFIX 1 doc: SCHEMA $.page_name AS page_name TAG 
    
  3. Indexing page_name and referring to visit_count(multiple lines for readability):

    FT.CREATE idx ON JSON PREFIX 1 doc: SCHEMA 
        $.visit_count AS visit_count SORTABLE NOINDEX 
        $.page_name AS page_name TAG 
    

    Notice that I mention visit_count with NOINDEX for quick access but without indexing (so it cannot be referred to in the query part, but can be referred to as part of the aggregations).

Querying the index

The trick is to use some dummy field that is equal in all the documents. You can use an existing one if you have one, but if not you can create one using APPLY (documentation here):

APPLY 1 AS __dummy 

will add to all the documents a field named __dummy with the value 1. This is only for the query context and it will not be added to the JSON object itself. Of course, any name and value are possible.

For cases 1 and 3:

FT.AGGREGATE idx * APPLY 1 AS __dummy GROUPBY 1 @__dummy REDUCE SUM 1 @visit_count AS count 

For case 2:

FT.AGGREGATE idx * APPLY 1 AS __dummy LOAD 3 $.visit_count AS visit_count GROUPBY 1 @__dummy REDUCE SUM 1 @visit_count AS count 

Output (for all cases):

1) (integer) 1
2) 1) "__dummy"
   2) "1"
   3) "count"
   4) "22"

EDIT

Apparently, you can GROUPBY 0 field, so the APPLY trick is redundant.

Use

FT.AGGREGATE idx * GROUPBY 0 REDUCE SUM 1 @visit_count AS count 

for cases 1 and 3, or, for case 2:

FT.AGGREGATE idx * LOAD 3 $.visit_count AS visit_count GROUPBY 0 REDUCE SUM 1 @visit_count AS count 

to get the output:

1) (integer) 1
2) 1) "count"
   2) "22"

Using redis-py:

import redis
import redis.commands.search
import redis.commands.search.reducers as reducers
from redis.commands.search.aggregation import AggregateRequest
from redis.commands.json.path import Path
from redis.commands.search.field import (
    NumericField,
    TagField,
)
from redis.commands.search.indexDefinition import IndexDefinition, IndexType


conn = redis.Redis(host="localhost", port=6379, db=0)

definition = IndexDefinition(
    prefix=["doc:"],
    index_type=IndexType.JSON,
)

conn.ft().create_index((
        TagField("$.page_name", as_name="page_name"),
        NumericField("$.visit_count", as_name="visit_count", sortable=True, no_index=True),
    ),
    definition=definition,
)

conn.json().set("doc:1", Path.root_path(), {"page_name": "home", "visit_count": 10})
conn.json().set("doc:2", Path.root_path(), {"page_name": "add_to_cart", "visit_count": 7})
conn.json().set("doc:3", Path.root_path(), {"page_name": "checkout", "visit_count": 5})

total_sum = conn.ft().aggregate(AggregateRequest().group_by([], reducers.sum("visit_count"))).rows[0][1]
print(int(total_sum)) # 22

ALTERNATIVE APPROACH

If you don't need to index the documents for other reasons, you can use a LUA script to scan all the keys and sum up this value. Scanning the entire key space might be slow if you have a lot of keys, but it will not require additional memory overhead and indexing time when modifying or adding new keys