SAS retain statement and existing variables

1.1k views Asked by At

I'm trying to understand how the retain statement is supposed to work with existing variables, but still it seems I missing something as I do not get the desired result

In the following example my code aim to create a sort of counter for the value variable

 data new (sortedby=id);
 input id $ value count;
 datalines ;
 d 55 0
 d 66 0
 d 33 0
 run;

 data cc;
 set new;
 by id;
 retain count;
 count+value;
 run;

And I 'm expecting that the count variable will be the result of the cumulation of the value column. However, the result is not achived and the column keep its original 0 values.

I would be interested in understanding why the implict retain statement in the "+" sign is not working in this case.

It is an issue related to the fact that count is an already existing variables?

Bests

3

There are 3 answers

2
Quentin On BEST ANSWER

All the RETAIN statement does is prevent variable from being set to missing at the top of the DATA step. In your code, your SET statement reads a value for COUNT (0), so even though the value is retained, it is reset to 0 when the SET statement executes.

I would play with code like below, with lots of PUT statements in it:

data cc;
   put "Top of loop" (_n_ value count count2 count3)(=) ;
   set new;
   put "After set statement " (_n_ value count count2 count3)(=) ;
   by id;
   retain count;
   count+value;
   count2+value ;
   count3=sum(count3,value) ;
   put "After sum statement" (_n_ value count count2 count3)(=) ;
run;

At the top of the loop, Count and Count2 are retained. Count is retained because of the explicit retain statement, and because it was read on a SET set statement. Count2 is retained because the sum statement has an implicit retain. Count3 is not retained.

Results are like:

Top of loop         _N_=1 value=.  count=.  count2=0  count3=.
After set statement _N_=1 value=55 count=0  count2=0  count3=.
After sum statement _N_=1 value=55 count=55 count2=55 count3=55

Top of loop         _N_=2 value=55 count=55 count2=55  count3=.
After set statement _N_=2 value=66 count=0  count2=55  count3=.
After sum statement _N_=2 value=66 count=66 count2=121 count3=66

Top of loop         _N_=3 value=66 count=66 count2=121 count3=.
After set statement _N_=3 value=33 count=0  count2=121 count3=.
After sum statement _N_=3 value=33 count=33 count2=154 count3=33
Top of loop         _N_=4 value=33 count=33 count2=154 count3=.
0
Tom On

Yes the fact that the variable is already on the input dataset will impact your program. When the SET statement executes the retained value of COUNT is overwritten by the value of COUNT read from the input dataset.

Note that actually all variables that come from input dataset are already retained across data step iterations by SAS. This explains how the MERGE statement is able to implement a one to many merge. It also explains the way SAS keeps the values from the last observation in the shorter group when you do an N to M merge.

0
nickel715 On

From what I learned about retain, I would suggest this solution:

data cc;
if count = . then previous_count = 0; else previous_count = count;
set new;
by id;
drop previous_count;
count = previous_count + value;
run;

A few comments: since count already exists, SAS retains the variable anyway, and you can get the old value before the set is executed.

For the first iteration, count is . and SAS can not add a number to .. I simply fixed that with an if.

As Quentin suggested, adding some put statements really helps to better understand whats going on!