At work have a set of floating point values that I sort and compute a CDF for and plot within gnuplot. I'd like to draw a line showing where the 80% and 90% thresholds of the CDF are, i.e. a line coming in from the left @ the 0.8 y tic mark, touching the graph and then dropping down to whatever that value might be. This is to help guide the viewers eye.
The data is generated automatically and I make multiple plots so I don't want to have to hand craft these lines each time.
It's trivial to draw a horizontal arrow going completely across the plot at the 0.8 and 0.9 y-value points, but I don't understand how to determine where the vertical line should be drawn. Here is a q/a wrt drawing arrows: Gnuplot: Vertical lines at specific positions, but the positions are known a priori.
Here is some sample data (my work machine is not internet accessible so sharing is hard)
X Y
5.0 | 0.143
8.0 | 0.288
16.0 | 0.429
25.0 | 0.714
39.0 | 0.857
47.0 | 1.000
Any ideas?
Here is my take (using percentile ranks), which only assumes a univariate series of measurement is available (your column headed
X
). You may want to tweak it a little to work with your pre-computed cumulative frequencies, but that's not really difficult.This yields the following output:
You can add as many percentile values as you want, of course; you just have to define a new variable, e.g.
perc90
, as well as ask for two otherarrow
commands, and replace every occurrence of0.8
(ah... the joy of magic numbers!) by the desired one (in this case, 0.9).Some explanations about the above code:
table
(first four lines); (we could ask awk to start at the 5th lines, but let's go with that.)trunc(rank(x))/length(x)
to get the percentile ranks.)If you want to give R a shot, you can safely replace that long series of sed/awk commands with a call to R like
assuming
rnd.dat
is in your home directory.Sidenote: And if you can live without gnuplot, here are some R commands to do that kind of graphics (even not using the
quantile
function):