SAS DATA STEP WHERE CONDITION IN A LIST

2.5k views Asked by At

I tried to filter data where they are on a list by using Data step in SAS

proc sql;
create table id_list as
select distinct id from customer;
quit;

data test;
set fulldata;
where id in id_list;
run;

It doesnt work. However, if I use "where id in (1,2,3)" it works. Could anyone please help me with where in a list of data ? Thanks

3

There are 3 answers

2
Richard On BEST ANSWER

If your fulldata is sorted or indexed by id then you can use a MERGE.

This approach is useful when the list is very large and could exceed 64K characters when placed in a macro variable.

proc sort data=customer(keep=id) out=list nodupkey;
  by id;

data want;
  merge fulldata(in=full) list(in=list);
  by id;
  if full and list;
run;

For the case of wanting to stack multiple data sets the use of a hash is recommended.

Example:

Several big tables with some overlapping id values are to be stacked and filtered by matching ids to those in a smaller table that might have repeated ids.

data big1 big2 big3 big4 big5 big6 big7;
  do id = 1 to 6666;
    if 0.00 <= id / 3333 <= 0.50 then output big1;
    if 0.25 <= id / 3333 <= 0.75 then output big2;
    if 0.50 <= id / 3333 <= 1.00 then output big3;
    if 0.75 <= id / 3333 <= 1.25 then output big4;
    if 1.00 <= id / 3333 <= 1.50 then output big5;
    if 1.25 <= id / 3333 <= 1.75 then output big6;
    if 1.50 <= id / 3333 <= 2.00 then output big7;
  end;
run;

data small;
  do _n_ = 1 to 666;
    id = rand('integer', 6666);
    output;
    do while (rand('uniform') < 0.10);
      output;
    end;
  end;
run;

data want;
  attrib id length=8;

  if _n_ = 1 then do;
    declare hash lookup (dataset:'small');
    lookup.defineKey('id');
    lookup.defineDone();
  end;

  set big1-big7 indsname=from;
  source=from;

  if lookup.check() = 0;
run;
4
Stu Sztukowski On

You need to use a macro variable to save and reference your id list. You cannot reference a separate table in a where statement within the data step.

proc sql noprint;
    select distinct id 
    into :id_list separated by ','
    from customer
    ;
quit;

&id_list will resolve to 1,2,3 if your distinct customer IDs are 1, 2, and 3.

data test;
    set fulldata;
    where id in(&id_list);
run;
1
Reeza On

SQL is easiest here directly, using a subquery in the IN filter.

proc sql;
create table test as
select * 
from fulldata 
where id in 
  (select distinct id from customer);
quit;