Array to include macro variable in SAS

67 views Asked by At

I would like to sum up and create & rates, each month of data is a column in my dataset. I want to have a dataset with the rate and sum for each month at the end. I currently run this code to create some calculations:

PROC SQL;
CREATE TABLE WANT AS 
SELECT 
 (SUM(ROLL_&mmmyy2) /SUM(ACCOUNTS_&mmmyy2))*100 AS &mmmyy2
,(SUM(ROLL_&mmmyy3) /SUM(ACCOUNTS_&mmmyy3))*100 AS &mmmyy3
,(SUM(ROLL_&mmmyy4) /SUM(ACCOUNTS_&mmmyy4))*100 AS &mmmyy4
,(SUM(ROLL_&mmmyy5) /SUM(ACCOUNTS_&mmmyy5))*100 AS &mmmyy5
,(SUM(ROLL_&mmmyy6) /SUM(ACCOUNTS_&mmmyy6))*100 AS &mmmyy6
,(SUM(ROLL_&mmmyy7) /SUM(ACCOUNTS_&mmmyy7))*100 AS &mmmyy7
,(SUM(ROLL_&mmmyy8) /SUM(ACCOUNTS_&mmmyy8))*100 AS &mmmyy8
,(SUM(ROLL_&mmmyy9) /SUM(ACCOUNTS_&mmmyy9))*100 AS &mmmyy9
,(SUM(ROLL_&mmmyy10) /SUM(ACCOUNTS_&mmmyy10))*100 AS &mmmyy10
,(SUM(ROLL_&mmmyy11) /SUM(ACCOUNTS_&mmmyy11))*100 AS &mmmyy11
,(SUM(ROLL_&mmmyy12) /SUM(ACCOUNTS_&mmmyy12))*100 AS &mmmyy12
,(SUM(ROLL_&mmmyy13) /SUM(ACCOUNTS_&mmmyy13))*100 AS &mmmyy13
,(SUM(ROLL_&mmmyy14) /SUM(ACCOUNTS_&mmmyy14))*100 AS &mmmyy14

,SUM(ROLL_&mmmyy2) AS SUM_&mmmyy2
,SUM(ROLL_&mmmyy3) AS SUM_&mmmyy3
,SUM(ROLL_&mmmyy4) AS SUM_&mmmyy4
,SUM(ROLL_&mmmyy5) AS SUM_&mmmyy5
,SUM(ROLL_&mmmyy6) AS SUM_&mmmyy6
,SUM(ROLL_&mmmyy7) AS SUM_&mmmyy7
,SUM(ROLL_&mmmyy8) AS SUM_&mmmyy8
,SUM(ROLL_&mmmyy9) AS SUM_&mmmyy9
,SUM(ROLL_&mmmyy10) AS SUM_&mmmyy10
,SUM(ROLL_&mmmyy11) AS SUM_&mmmyy11
,SUM(ROLL_&mmmyy12) AS SUM_&mmmyy12
,SUM(ROLL_&mmmyy13) AS SUM_&mmmyy13
,SUM(ROLL_&mmmyy14) AS SUM_&mmmyy14

FROM HAVE;
QUIT;

The &mmmyy(number) macros are all dates.

Is there a way to simplify this, I'm currently trying:

DATA WANT;
SET HAVE;
ARRAY ROLL_&&MMMYY[12] ROLL_&&MMMYY2-ROLL_&&MMMYY14; 
ARRAY ACCOUNTS_&&MMMYY[12] ACCOUNT&&MMMYY2-ACCOUNT&&MMMYY14;
DO I = 2 TO 14;
&&MMMYY[I]= (SUM(ROLL_&&MMMYY[I])/SUM(ACCOUNTS_&&MMMYY[I]))*100;
END;
RUN;

This is an example of how the data looks:

Account Number  ROLL_Jun23  Roll_May23  Roll_Apr23  Roll_Mar23 Accounts_Jun23   Accounts_May23  Accounts_Apr23  Accounts_Mar23
1                  1            0           0           0           1                   1           1                 1
2                  0            1           1           0           1                   1           0                 0
3                  0            0           0           0           1                   0           0                 0
4                  1            1           1           1           1                   1           1                 0
5                  1            1           1           1           1                   1           1                 0
6                  1            1           0           0           1                   1           0                 0

The idea of my array is that I would end up with data like:

Jun23        May23        Apr23        Mar23        
66.67        80.00        100.00       200.00
2

There are 2 answers

2
Richard On BEST ANSWER

You should treat data as data, and thus I emphasize using a pivot strategy to take the dates out of the variable names.

Use a reporting procedure such as TABULATE or REPORT to present summaries of your now categorical data.

Example:

A DATA step view is used to pivot the data for use in TABULATE.

data have;
 input Account_Number ROLL_Jun23 Roll_May23 Roll_Apr23 Roll_Mar23 Accounts_Jun23 Accounts_May23 Accounts_Apr23 Accounts_Mar23;
 datalines;
1 1 0 0 0 1 1 1 1
2 0 1 1 0 1 1 0 0
3 0 0 0 0 1 0 0 0
4 1 1 1 1 1 1 1 0
5 1 1 1 1 1 1 1 0
6 1 1 0 0 1 1 0 0
;

data flags / view=flags;
  set have;
  array rolls roll_:;
  array accounts accounts_:;
  do over rolls;
    date = input('01'||scan(vname(rolls),2,'_'),date7.);
    roll_flag = rolls;
    acct_flag = accounts;
    output;
  end;
  format date monyy5.;
  keep account_number date roll_flag acct_flag;
run;

proc tabulate data=flags;
  class account_number;
  class date / descending;
  var roll_flag acct_flag;
  table 
    date='Month' * (roll_flag='Roll' acct_flag='Acct') * f=5. * sum='';
  ;
run;

Results

-------------------------------------------------
|                     Month                     |
|-----------------------------------------------|
|   JUN23   |   MAY23   |   APR23   |   MAR23   |
|-----------+-----------+-----------+-----------|
|Roll |Acct |Roll |Acct |Roll |Acct |Roll |Acct |
|-----+-----+-----+-----+-----+-----+-----+-----|
|    4|    6|    4|    5|    3|    3|    2|    1|
-------------------------------------------------
3
Tom On

The best thing to do is not to place data, dates in this case, into metadata, the names of variables. If you just restructure the data to have three variables and N observations instead of 2*N variables then you have not reason to use macro code to generate code. Now your key variables become ACCOUNT_NUMBER and DATE and your data variables become just ROLL and ACCOUNTS and you could use SQL more easily.

 select date
      , sum(roll) as roll_sum
      , sum(accounts) as accounts_sum
      , calculated roll_sum/calculated accounts_sum as average
   from have
   group by date 
 ;

If you do want to use ARRAY you probably need to first take the SUM of the values.

proc summary data=have ;
   var roll_: accounts_: ;
   output out=summary sum= ;
run;

Now your arrays could look like:

data want;
  set summary;
  array roll roll_:;
  array accounts accounts_: ;
  do index=1 to dim(roll);
     length date $5;
     date = scan(vname(roll[index]),-1,'_');
     roll_sum = roll[index];
     account_sum = accounts[index];
     average = roll_sum/account_sum ;
     output;
  end;
  keep index date roll_sum account_sum average;
run;