Identify vector index or value at a specific cumulative sum (or probability) in R

1.5k views Asked by At

This seems like a simple problem, but for some reason I haven't been able to find a solution.

I have a matrix of probabilities that sum to 1, and I want to know at which value I have a cumulative sum of, for example, 0.5. In other words, if I turned this matrix into a sorted vector, how far do I have to go from the highest value to get a cumulative sum of 0.5.

I transformed my matrix into a vector of values and used plot(cumsum(x)) to produce the following graph:

Cumulative Sum of Vector Values

I can do something like

P<-ecdf(x)
P(0.00001)

to get the cumulative sum at an x value of 0.00001, but I want to go in the other direction, i.e. what is the x value at a cumulative sum of 0.5?

quantile() gives me the value at 50% of the ordered values (e.g. it would give me the value of sort(x)[4e+05] in the graph above), which is not what I'm after.

Thanks for your help with this seemingly simple question!

Cheers, Josh

Solution:

x[max(which(cumsum(x)<=0.5))]

gives the value at the cumulative sum of 0.5 (thanks @plafort), although it seems as though there should be an easier way!

1

There are 1 answers

0
SabDeM On

I think I get what you want; Here is my solution: where my goal is to find out the element of the matrix where the cumsum is >= 20 for example. Even though I think that there must be a super easier way to achieve that.

set.seed(1)
data <- matrix(rnorm(9, 10), 3, 3)
data
          [,1]      [,2]     [,3]
[1,]  9.373546 11.595281 10.48743
[2,] 10.183643 10.329508 10.73832
[3,]  9.164371  9.179532 10.57578
which(cumsum(data) >= 500)[1]
[1] NA
which(cumsum(data) >= 20)[1]
[1] 3