I'm using Netezza. I'm working with marketing data, specifically coupons. Right now I'm counting distinct daily coupon redeemers; no big deal. I'd like to get the count of distinct redeemers to date, though. Note that it's not just the sum of daily redeemers, since a customer might redeem on different days, and therefore the sum of daily redeemers could be multi-counting customers.
I closed my eyes, made a wish, and executed the following query hoping it would work:
select redemption_date
,count(distinct(customer_id)) as day_redeemers
,count(distinct(customer_id)) over (partition by null rows unbounded preceding) as cml_redeemers
from coupon_history
group by 1
order by 1
But Netezza complains: ERROR [HY000] ERROR: Attribute CUSTOMER_ID must be GROUPed or used in an aggregate function
...and so I close my eyes, make a wish, and execute the following (note the addition to the group by):
select redemption_date
,count(distinct(customer_id)) as day_redeemers
,count(distinct(customer_id)) over (partition by null rows unbounded preceding) as cml_redeemers
from coupon_history
group by 1,customer_id
order by 1
Netezza complains as follows:
ERROR [HY000] ERROR: DISTINCT aggregate not allowed in window with ORDER BY or frame specification
That error leads me to think that internally Netezza is ordering the customer_id in order to count transitions and thus distincts. But it does leave me at sort of a loss for what I should try next. I was hoping for something simple, but apparently it's not my lucky day.
Any ideas on how to make my original query work, or suggestions on an alternate approach?
Thanks!
You can always resort to brute force -- that is, to a correlated subquery:
Of course, performance will not be as good.
EDIT:
Another way to approach this is to get the first redemption date for each customer and then just use cumulative sums: