SAS Macro to Combine Municipal Proc SQL Statements Based on Date Criteria

61 views Asked by At

I have a series of proc sql statements which pull data for Active, Inactive and Lapsed customers. I end up with 3 tables.

  • *Customers_Active
  • *Customers_InActive
  • *Customers_Lapsed

Active: 0-12M purchaser

Inactive: 13-24M purchaser, did not purchase 0-12M (active day range minus 12 months)

Lapsed: 25-36M purchaser, did not purchase 0-24M (inactive day range minus 24 months)

Again, In order to do this I have 3 separate proc sql statements and I start out by defining 6 macro variables.

I want to apply macro language so I can uses just 1 query to produce the 3 tables.

%let actstart = '24Feb2013'd;
%let actend = '22Feb2014'd;
%let iactstart = '26Feb2012'd;
%let iactend= '23Feb2013'd;
%let lapstart = '27Feb2011'd;
%let lapend = '25Feb2012'd;*

Here are the three statements

/Active Table/

proc sql;    
   create table Customers_Active as 
   select 
      household_id, 
      customer_id, 
      web_order_id, 
      transaction_date, 
      sku, 
      quantity, 
      original_price 
   from transaction_vw 
   where transaction_date >= &actstart and transaction_date <= &actend
   order by customer_id; 
quit;

/Inactive Table/

Notice I am not bringing in the customer_id which is already in the Customers_Active table.

proc sql;    
   create table Customers_Inactive as 
   select 
      household_id, 
      customer_id, 
      web_order_id, 
      transaction_date, 
      sku, 
      quantity, 
      original_price 
   from transaction_vw
   where transaction_date >= & iactstart and transaction_date <= &iactend
     and customer_id not in (select distinct customer_id from Customers_Active)
    order by customer_id; 
quit;

/Lapsed Table/

Notice I am not bringing in the customer_id which is not already in the Customers_Active and the Customers_Inactive table.

proc sql;
   create table Customers_Lapsed as 
   select 
      household_id, 
      customer_id, 
      web_order_id, 
      transaction_date, 
      sku, 
      quantity, 
      original_price 
   from transaction_vw
   where transaction_date >= & lapstart and transaction_date <= & lapend
     and customer_id not in (select distinct customer_id from Customers_Active)
     and customer_id not in (select distinct customer_id from Customers_Inactive)
   order by customer_id;
quit;

To recap: the end result is 3 tables.

  • *Customers_Active
  • *Customers_InActive
  • *Customers_Lapsed

•I'm pulling in the same vars for all three tables •Just the date range changes •In addition I don't want to include customer_id in the Customers_Inactive table which is already in the Customers_Active table •and customer_ids in the lapsed table which are already in the Customer_Active and Customer_inactive tables

Again, since I'm pulling in the same vars I don't want to have three separate queries to do this just one.

1

There are 1 answers

0
Joe On BEST ANSWER

I would do this in a bit different manner. You can do it in a few ways, but maybe one SQL step and one datastep would be easiest.

proc sql;
  create table lookup_lastdate as 
    select customer_id as start, max(transaction_Date) as label,
      'LASTDATEF' as fmtname
    from transaction_vw
    group by customer_id;
quit;

proc format cntlin=lookup_lastdate;
quit;

%let today=11JUN2015;
data customers_active customers_inactive customers_lapsed;
  set transaction_vw;
  years = intck('YEAR',put(customer_id,LASTDATEF.),"&today."d,'c');
  if years < 1 then output customers_active;
  else if years < 2 then output customers_inactive;
  else if years < 3 then output customers_lapsed;
run;

intck with the c modifier at the end tells number of periods between two dates, so 1 year means between 1 year and 1 year 364 days.

The point of doing it this way is to use fewer passes through the data - so, first calculate how long last purchase was, then output their data. It could be even more efficient than this in a DoW loop (which does 1 real pass and 2 virtual passes, assuming data fits in memory). So we use a format to get it out (you could also join the tables if you like joins better, formats are just faster). Then we compare that date using intck to see how many years have passed, and output to the appropriate file.

As a side note, there is a preference in SAS for not putting 3 files like this, but having 1 file and then adding a flag indicating which group it belongs to; then using by statements to do analyses on each group separately. This simplifies the code further. To do that, just change the if statements to set flags instead of outputting - or skip the if and just assign the year count directly.