R : Apply ecdf function on array

1k views Asked by At

at first i have a matrix like this :

x <- matrix(rnorm(1e3),260)

and then an Array

lst <- lapply(seq(1,length(x[,1]), by=52), function(i) x[i:(i+51),])
Data_array <- array(unlist(lst), dim=c(52,length(x[1,]),(length(x[,1])/52)))

This array is a sequence of the Dataframe by 52 (weeks). It's a temporal analysis (weekly)

I would like to compute an ecdf function on this array.

, , 1

             [,1]        [,2]        [,3]
 [1,]  **0.66319631**  0.01004290  0.02133477
 [2,] -1.64273648  0.23105503  1.02862145
 [3,]  1.17083363 -0.49700717 -0.01119745

, , 2

             [,1]        [,2]         [,3]
 [1,] **-0.79365987**  1.28394049 -0.547763434
 [2,] -0.09221301  1.07676841  0.570294731
 [3,]  0.20293308  1.00182888  0.247373981

, , 3

             [,1]         [,2]        [,3]
 [1,]  **1.03862172** -0.961678683  1.25334651
 [2,]  0.58476540  0.745250484 -0.06183788
 [3,]  0.24057690  1.226575038  0.23363005

compute ecdf function for each cell. It's for a weekly seasonal analysis.

i.e. calcul quantile for this time series (**): 0.66319631;-0.79365987;1.03862172

for MEAN it's works :

array_lag_sum<-apply(Data_array,c(1,2),FUN=function(x){mean(x,na.rm=TRUE)})

i tried a similar function whith ecdf, but it doesn't work.

percent_array<-apply(Data_array,c(1,2),FUN=function(u){ecdf(u)(u)})

Then...it is not finish, i would like to reformat this array like the original format of the data dataframe (x). (like a rbind but on an array.)

Thank you so much for your help.

edit :

sorry, but i don't know if i was so clear. It's sur that array is complicated for me;

but with your method, if i have this simple data frame :

B <- matrix(seq(1,20), 20, 3)

    > B
          [,1] [,2] [,3]
     [1,]    1    1    1
     [2,]    2    2    2
     [3,]    3    3    3
     [4,]    4    4    4
     [5,]    5    5    5
     [6,]    6    6    6
     [7,]    7    7    7
     [8,]    8    8    8
     [9,]    9    9    9
    [10,]   10   10   10
    [11,]   11   11   11
    [12,]   12   12   12
    [13,]   13   13   13
    [14,]   14   14   14
    [15,]   15   15   15
    [16,]   16   16   16
    [17,]   17   17   17
    [18,]   18   18   18
    [19,]   19   19   19
    [20,]   20   20   20

Your function gives :

    Data_array <- array( B, dim=c(10,3,5))

, , 1

      [,1] [,2] [,3]
 [1,]    1   11    1
 [2,]    2   12    2
 [3,]    3   13    3
 [4,]    4   14    4
 [5,]    5   15    5
 [6,]    6   16    6
 [7,]    7   17    7
 [8,]    8   18    8
 [9,]    9   19    9
[10,]   10   20   10

, , 2

      [,1] [,2] [,3]
 [1,]   11    1   11
 [2,]   12    2   12
 [3,]   13    3   13
 [4,]   14    4   14
 [5,]   15    5   15
 [6,]   16    6   16
 [7,]   17    7   17
 [8,]   18    8   18
 [9,]   19    9   19
[10,]   20   10   20

or i would more something like this :

,,1

      [,1] [,2] [,3]
 [1,]    1    1    1
 [2,]    2    2    2
 [3,]    3    3    3
 [4,]    4    4    4
 [5,]    5    5    5
 [6,]    6    6    6
 [7,]    7    7    7
 [8,]    8    8    8
 [9,]    9    9    9
[10,]   10   10   10

,,2
      [,1] [,2] [,3]
 [1,]   11   11   11
 [2,]   12   12   12
 [3,]   13   13   13
 [4,]   14   14   14
 [5,]   15   15   15
 [6,]   16   16   16
 [7,]   17   17   17
 [8,]   18   18   18
 [9,]   19   19   19
[10,]   20   20   20

and get in result a table which is the percentile value of the time series. percentile values of 1 and 11, 2 and 12 for each column and each row (i know it's not pertinent but it's just for exemple)

Sorry if my last question was not understandable

1

There are 1 answers

6
IRTFM On BEST ANSWER

The answer is:

 ecdf_mat <- apply( Data_array, 1:2, ecdf)

This passes values from each combination of the first two indices to the the function, ecdf. Each of those passes will return a function into a matrix location. You are getting something most people will not be able to use without a bit of coaching: one 52 x 4 matrix of functions. The functions are contained in lists which are valid matrix or array elements:

> dim(apply( Data_array, 1:2, ecdf) )
[1] 52  4

To access them you need to first pull them out of the matrix with standard "[" indexing but then pull them out of the list container with a call to "[[1]]":

> str(apply( Data_array, 1:2, ecdf)[1,1] )
List of 1
 $ :function (v)  
  ..- attr(*, "class")= chr [1:3] "ecdf" "stepfun" "function"
  ..- attr(*, "call")= language FUN(newX[, i], ...)

> apply( Data_array, 1:2, ecdf)[1,1][[1]]
Empirical CDF 
Call: FUN(newX[, i], ...)
 x[1:5] = -0.92217, -0.37471, 0.058284, 0.28502, 0.44391

> apply( Data_array, 1:2, ecdf)[1,1][[1]](0)
[1] 0.4

Edit:------

It appears you don't want the ecdf's themselves (despite getting no response to my efforts at getting you to recognize the distinction), but rather want an identically shaped array with the percentile values for the i-j positions considered as individual length k-sequences. I can think of two ways to do this. The first one would use that matrix of ecdf functions I built and demonstrated, but I believe that is the more baroque method and it would be easier to give you a more direct route. I've take the liberty of making this more manageable by making the long first dimension only 10-long.

x <- matrix(rnorm(1e3),260)
lst <- lapply(seq(1,length(x[,1]), by=10), function(i) x[i:(i+51),])
Data_array <- array(unlist(lst), dim=c(10,length(x[1,]),(length(x[,1])/52

pctiles2 <-  apply( Data_array,  1:2, function(x) ecdf(x)(x) )

> str(pctiles2)
 num [1:5, 1:10, 1:4] 0.8 0.4 0.6 0.2 1 0.4 1 0.2 0.6 0.8 ...

They aren't actually percentiles, but that could be easily remedied by slipping a 100* in from of the ecdf call (or multiplying the result by 100.. You will notice that the structure has been permuted so that the quantile/percentiles sequences run down the first column. That because apply always delivers its result in column major order. There is a function aperm which would allow you to re-arrange these in the original order:

re_pctiles <- aperm(pctiles, c(2,3,1) )