Creating a Quartile Column using average per observations instead as row value

41 views Asked by At

I have panel data time series, and I would like to create a variable with the quartile of the mean of a given variable so that a firm can only be found in a given quartile. For example, if I have 4 companies:

 df = 
    id year value Quartile* Quartile**
    1  2010 1      1         1
    1  2015 1      1         1
    2  2010 10     2         2  
    2  2015 10     2         2
    3  2010 10     2         3
    3  2015 20     3         3
    4  2010 40     4         4
    4  2015 40     4         4

With the standard approach Quartile* such:

df<- within(df, Quartile* <- as.integer(cut(TotalAssets_wins,
                                            quantile(value, probs=0:4/4), 
                                            include.lowest=TRUE)))

I obtain the values for Quartile*, however, I would like to prevent companies from having different quartiles values through time. For this reason, I would like to compute the value of the quartile given the average of all observations per firm, to obtain the values for Quartile**. The key difference is that they are firm dependent values. Any idea on how to implement this in my code?

1

There are 1 answers

0
lmo On BEST ANSWER

Here is one method using tapply, rank, and split.

# create 0 vector
dat$q <- 0
# fill it in
split(dat$q, dat$id) <- rank(tapply(dat$value, dat$id, FUN=mean))

Here, tapply calculates the mean by ID, and rank ranks these means. We feed this ranking into column q of the data.frame using split. As a side note, because tapply and split will order the observations into the same groups in the same order, the observations do not have to be in any particular order for this to work.

This returns

dat
  id year value Quartile. Quartile.. q
1  1 2010     1         1          1 1
2  1 2015     1         1          1 1
3  2 2010    10         2          2 2
4  2 2015    10         2          2 2
5  3 2010    10         2          3 3
6  3 2015    20         3          3 3
7  4 2010    40         4          4 4
8  4 2015    40         4          4 4

where the q column matches the desired values in the Quartile.. column.

data

dat <-
structure(list(id = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L), year = c(2010L, 
2015L, 2010L, 2015L, 2010L, 2015L, 2010L, 2015L), value = c(1L, 
1L, 10L, 10L, 10L, 20L, 40L, 40L), Quartile. = c(1L, 1L, 2L, 
2L, 2L, 3L, 4L, 4L), Quartile.. = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 
4L)), .Names = c("id", "year", "value", "Quartile.", "Quartile.."
), class = "data.frame", row.names = c(NA, -8L))