I want to calculate a median by groups for 100 variables with hash tables.
I have find this to calculate the median of invoice on sashelp.cars, but if I want to do the median by Make and Model, for example, how do I adapt it?
data percentiles ;
keep percentile Invoice ;
format percentile percent5.;
dcl hash ptiles (dataset: "sashelp.cars(where=(Invoice gt 0))",multidata:"Y",ordered:"A");
ptiles.definekey("Invoice");
ptiles.definedone();
declare hiter iterP ("ptiles");
array _ptiles(6) _temporary_ (.5 .05 .1 .25 .75 .95);
call sortn(of _ptiles(*));
num_items=ptiles.num_items;
do i=1 to dim (_ptiles);
percentile=_ptiles(i);
do while (Counter lt percentile*num_items);
Counter+1;
iterP.next();
end;
output;
end;
stop;
set sashelp.cars;
run;
In fact, in my reel data i want to calculate the median of 100 variables. Actually , i do this with proc univariate but it is so long (>12hours)
This is not a good idea. You are very unlikely to write a DATA step like this that will be faster than PROC MEANS:
Note PROC MEANS may be significantly faster than PROC UNIVARIATE, because it does less work. You could also try a simple SQL step.
If your UNIVARIATE step is taking a long time, you probably have lots of data. On My PC, I ran your DATA step and PROC MEANS using as input:
Your DATA step took 8 seconds, PROC MEANS took 2 seconds.
Also note that your DATA step has a very simplistic approach for calculating percentiles, which is not handling ties. If you compare the results from your step they will not match results from PROC UNIVARIATE or PROC MEANS.
There are lots of efficiency factors to consider when working with SAS (where is the data? Is there a network involved? etc.) But generally you don't want to hand-code something that is provided by SAS as a feature.