Cumulative count distinct in Netezza

6.2k views Asked by At

I'm using Netezza. I'm working with marketing data, specifically coupons. Right now I'm counting distinct daily coupon redeemers; no big deal. I'd like to get the count of distinct redeemers to date, though. Note that it's not just the sum of daily redeemers, since a customer might redeem on different days, and therefore the sum of daily redeemers could be multi-counting customers.

I closed my eyes, made a wish, and executed the following query hoping it would work:

select redemption_date
      ,count(distinct(customer_id)) as day_redeemers
      ,count(distinct(customer_id)) over (partition by null rows unbounded preceding) as cml_redeemers
from coupon_history
group by 1
order by 1

But Netezza complains: ERROR [HY000] ERROR: Attribute CUSTOMER_ID must be GROUPed or used in an aggregate function

...and so I close my eyes, make a wish, and execute the following (note the addition to the group by):

select redemption_date
      ,count(distinct(customer_id)) as day_redeemers
      ,count(distinct(customer_id)) over (partition by null rows unbounded preceding) as cml_redeemers
from coupon_history
group by 1,customer_id
order by 1

Netezza complains as follows:

ERROR [HY000] ERROR:  DISTINCT aggregate not allowed in window with ORDER BY or frame specification

That error leads me to think that internally Netezza is ordering the customer_id in order to count transitions and thus distincts. But it does leave me at sort of a loss for what I should try next. I was hoping for something simple, but apparently it's not my lucky day.

Any ideas on how to make my original query work, or suggestions on an alternate approach?

Thanks!

1

There are 1 answers

6
Gordon Linoff On BEST ANSWER

You can always resort to brute force -- that is, to a correlated subquery:

select redemption_date,
       count(distinct(customer_id)) as day_redeemers,
       (select count(distinct ch2.customer_id)
        from coupon_history ch2
        where ch2.redemption_date <= ch.redemption_date
       ) as cml_redeemers
from coupon_history ch
group by 1
order by 1;

Of course, performance will not be as good.

EDIT:

Another way to approach this is to get the first redemption date for each customer and then just use cumulative sums:

select minrd,
       sum(count(*)) over (order by minrd) as cml_redeemers
from (select ch.customer_id, min(redemption_date) as minrd
      from coupon_history ch
      group by ch.customer_id
     ) ch
group by minrd;