I have a large dataset with columns as follows:
| ID |State |
| 1 | MI |
| 1 | CA |
| 2 | CA |
| 2 | CA |
| 3 | WY |
| 3 | WY |
I want to keep the following records in a new data set, if any ID contains CA in state, then keep all of their records otherwise, delete those rows.
RESULT
| ID |State |
| 1 | MI |
| 1 | CA |
| 2 | CA |
| 2 | CA |
How can I do this with SAS?
libname sql 'D:/Ro/';
proc sql;
create table sql.idstate
(state char(2), /* 2–character column for */
/* state abbreviation */
ID num /* column for date of entry into the US */
);
Created some sort of table, but it's empty - I think I need to add a data statement somewhere?
Assuming you meant that you have a dataset like this:
Then you should be able to use a subquery with the IN operator in PROC SQL.
If you are having performance issues with a large dataset and it is already sorted by ID then a data step might perform better. The data step will also have to pass through the dataset twice, but it can do it one ID at a time and so take advantage of disk caching the operating system is probably doing already.