I have panel data time series, and I would like to create a variable with the quartile of the mean of a given variable so that a firm can only be found in a given quartile. For example, if I have 4 companies:
df =
id year value Quartile* Quartile**
1 2010 1 1 1
1 2015 1 1 1
2 2010 10 2 2
2 2015 10 2 2
3 2010 10 2 3
3 2015 20 3 3
4 2010 40 4 4
4 2015 40 4 4
With the standard approach Quartile* such:
df<- within(df, Quartile* <- as.integer(cut(TotalAssets_wins,
quantile(value, probs=0:4/4),
include.lowest=TRUE)))
I obtain the values for Quartile*, however, I would like to prevent companies from having different quartiles values through time. For this reason, I would like to compute the value of the quartile given the average of all observations per firm, to obtain the values for Quartile**. The key difference is that they are firm dependent values. Any idea on how to implement this in my code?
Here is one method using
tapply
,rank
, andsplit
.Here,
tapply
calculates the mean by ID, andrank
ranks these means. We feed this ranking into column q of the data.frame usingsplit
. As a side note, becausetapply
andsplit
will order the observations into the same groups in the same order, the observations do not have to be in any particular order for this to work.This returns
where the q column matches the desired values in the Quartile.. column.
data