Sas has a procedure called rank that assigns a "rank" to each row in a dataframe according to the position in an ordered set of a variable, kind of; but the rank is not just the position: one has to tell the procedure how many groups use in the ranking. The rank is actually the group to which the row belongs.
In SQL terms, this is called a dense ranking.
Example (the salary variable is included for generality, but it is not used in this example):
Say we have this data frame:
If we rank by age using 4 groups, sas would give us this:
It is easier to understand what happened if we sort the data by the variable we ranked:
Now we can see why rank gives us the position in an ordered set, kind of.
The rank procedure is very useful and cool, but I could't find in Deedle's doc how to perform it. Is there direct way to do it in Deedle or I need to create my own extension?
I suppose I could do it using these functions:
SortRows(frame, key)
chunk size series
I wrote my own extension:
where indexForKey is this other custom extension:
I tried this other definition hoping that it would run faster. It is slightly faster, but not by a lot; any comments on performance issues are welcomed: