R: Winsorize one column per year for panel data

169 views Asked by At

I would like to winsorize the column "Return". However, I want to winsorize per Year. My data looks like this:

structure(list(Name = c("A", "A", "A", "A", "A", "A", "B", "B", 
"B", "B", "B", "B", "C", "C", "C", "C", "C", "C"), Date = c("01.09.2018", 
"02.09.2018", "03.09.2018", "05.11.2021", "06.11.2021", "07.11.2021", 
"01.09.2018", "02.09.2018", "03.09.2018", "05.11.2021", "06.11.2021", 
"07.11.2021", "01.09.2018", "02.09.2018", "03.09.2018", "05.11.2021", 
"06.11.2021", "07.11.2021"), Return = c(0.05, 0.1, 0.8, 1.5, 
0.1, -2, 0.4, 0.6, 0.6, 0.2, -0.5, -0.9, -0.4, 1.7, 0.3, -4, 
0.6, 0.5), Year = c("2018", "2018", "2018", "2021", "2021", "2021", 
"2018", "2018", "2018", "2021", "2021", "2021", "2018", "2018", 
"2018", "2021", "2021", "2021")), row.names = c(NA, -18L), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), groups = structure(list(Year = c("2018", 
"2021"), .rows = structure(list(c(1L, 2L, 3L, 7L, 8L, 9L, 13L, 
14L, 15L), c(4L, 5L, 6L, 10L, 11L, 12L, 16L, 17L, 18L)), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L), .drop = TRUE))

I tried the following code which also works:

Data <- Data %>%
  group_by(Year) %>%
  mutate("Winsorized Return" = Winsorize(`Return`, probs=c(0.01, 0.99)))

Now I do the same but without grouping the data per year:

Data <- Data %>%
  mutate("Winsorized Return" = Winsorize(`Return`, probs=c(0.01, 0.99)))

I get exactly the same results even though I used "group_by(Years) in the first code. Can someone explain me why the results are the same? Even when I do the same with my real (very large) dataset I get the same results.
Do I have to use another code in order to winsorize the column "Return" per year?

Thank you very much already for any help!

0

There are 0 answers