I have a map done in R with stat_density2d
. This is the code:
ggplot(data, aes(x=Lon, y=Lat)) +
stat_density2d(aes(fill = ..level..), alpha=0.5, geom="polygon",show.legend=FALSE)+
geom_point(colour="red")+
geom_path(data=map.df,aes(x=long, y=lat, group=group), colour="grey50")+
scale_fill_gradientn(colours=rev(brewer.pal(7,"Spectral")))+
xlim(-10,+2.5) +
ylim(+47,+60) +
coord_fixed(1.7) +
theme_void()
And it produces this:
Great. It works. However I do not know what the legend means. I did find this wikipedia page:
https://en.wikipedia.org/wiki/Multivariate_kernel_density_estimation
And the example they used (which contains red, orange and yellow) stated:
The coloured contours correspond to the smallest region which contains the respective probability mass: red = 25%, orange + red = 50%, yellow + orange + red = 75%
However, using stat_density2d, I have 11 contours in my map. Does anyone know how stat_density2d works and what the legend means? Ideally I wanted to be able to state something like the red contour contains 25% of the plots etc.
I have read this: https://ggplot2.tidyverse.org/reference/geom_density_2d.html and I am still none the wiser.
Let's take the
faithful
example from ggplot2:(apologies in advance for not making this prettier)
The level is the height at which the 3D "mountains" were sliced. I don't know of a way (others might) to translate that to a percentage but I do know to get you said percentages.
If we look at that chart, level
0.002
contains the vast majority of the points (all but 2). Level0.004
is actually 2 polygons and they contain all but ~dozen of the points. If I'm getting the gist of what you're asking that's what you want to know, except not count but the percentage of points encompassed by polygons at a given level. That's straightforward to compute using the methodology from the various ggplot2 "stats" involved.Note that while we're importing the
tidyverse
andsp
packages we'll use some other functions fully-qualified. Now, let's reshape thefaithful
data a bit:(easier to type
x
andy
)Now, we'll compute the two-dimensional kernel density estimation the way ggplot2 does:
I won't clutter the answer with
str()
output but it's kinda fun looking at what happens there.We can use spatial ops to figure out how many points fall within given polygons, then we can group the polygons at the same level to provide counts and percentages per-level:
I only spelled it out to avoid blathering more but the percentages will change depending on how you modify the various parameters to the density computation (same holds true for my
ggalt::geom_bkde2d()
which uses a different estimator).If there is a way to tease out the percentages without re-performing the calculations there's no better way to have that pointed out than by letting other SO R folks show how much more clever they are than the person writing this answer (hopefully in more diplomatic ways than seem to be the mode of late).