I'm running an Elasticsearch cluster that doesn't have access to x-packs
on AWS, but I'd still like to do a cumulative cardinality aggregation
to determine the daily counts of new users to my site.
Is there an alternate solution to this problem?
For example, how could I transform:
GET /user_hits/_search
{
"size": 0,
"aggs": {
"users_per_day": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "day"
},
"aggs": {
"distinct_users": {
"cardinality": {
"field": "user_id"
}
},
"total_new_users": {
"cumulative_cardinality": {
"buckets_path": "distinct_users"
}
}
}
}
}
}
To produce the same result without cumulative_cardinality
?
Cumulative cardinality was added precisely for that reason -- it wasn't easily calculable before...
As with almost anything in ElasticSearch, though, there's a script to get it done for ya. Here's my take on it.
yielding
The script is guaranteed to be slow but has one, potentially quite useful, advantage -- you can adjust it to return the full list of new user IDs, not just the count that you'd get from the cumulative cardinality which, according to its implementation's author, only works in a sequential, cumulative manner by design.