Using analytical Count(distinct) on Vertica is not supported

1.9k views Asked by At

Having a thorough Google research, it seems that Vertica DB simply does not support count(distinct <col>) over(<partition by>), as it causes:

"ERROR 4249: Only MIN/MAX are allowed to use DISTINCT ... MIN/MAX are allowed to use DISTINCT" 

I'm looking for an easy walk-around for this one.

Meanwhile, I'm using joins or nested queries.

For example:

select campaign_id, segment_id, COUNT(DECODE(rank, 1, 1, NULL)) over()
from (select campaign_id, segment_id, row_number() over(partition by segment_id) rank
 from cs)

But my query is very long and I need to invent tricks all over the way. Any idea for a better approach?

Thanks!

(Working at HPE? Please implement this, as you did for all common analytical funcitions!)

1

There are 1 answers

0
kimbo305 On

I had to do something similar nested counting structure for counting distinct values cumulatively, over a date range. It boiled down to a similar gathering up of ROW_NUMBER() = 1 rows, though I used case:

COUNT(CASE WHEN rank = 1 THEN userID END) OVER (...)

It wasn't pretty to look at, but it was mercifully not slow.

I need to invent tricks all over the way

Yeah, I think that just happens when you bump into missing features.