Adding another variable information or adding weight into stat_density_2d

241 views Asked by At

Below is a simple code to produce stat_density_2d plot of X~Y.

plot_data <-
  data.frame(X = c(rnorm(300, 3, 2.5), rnorm(150, 7, 2)),
             Y = c(rnorm(300, 6, 2.5), rnorm(150, 2, 2)),
             Z = c(rnorm(300, 60, 5), rnorm(150, 40, 5)),
             Label = c(rep('A', 300), rep('B', 150)))
ggplot(plot_data, aes(x=X, y=Y)) + 
  stat_density_2d(geom = "polygon", aes(alpha = stat(level), fill = Label))

As I understand, the density is based on the count of X~Y. My question is can I use Z as the density? Or perhaps, add Z as weight to the density? I'm not sure I'm making sense here. As it is the density of X~Y is useful for me. But I'm just wondering if I can add the information in Z into the density of X~Y.

Perhaps you have alternative idea? Both the density of X~Y and Z are information that I want to convey. Currently I'm separating them into separate density X~Y, X~Z, Y~Z, and they're all useful to me (using my data of course).

Edit2, Manual Calculation Plan: I'm still working on this as I go. This is a general idea of what I'm planning to do.

  1. Instead of using stat_density_2d, I plan to calculate the density itself using the method used in stat_density_2d, which is MASS::kde2d().
  2. I would then use interpolation such as akima::interp, to interpolate the Z into X~Y grid.
  3. I would then multiply Z unto the density of X~Y (from 1) as a form of weightage.
  4. Plot them again using ggplot.

Edit 3: Update with code of applying Z as weightage towards density of X~Y.

library(ggplot2)
library(data.table)
library(ggnewscale)
library(akima)
library(MASS)

plot_data <-
  data.frame(X = c(rnorm(300, 3, 2.5), rnorm(150, 7, 2)),
             Y = c(rnorm(300, 6, 2.5), rnorm(150, 2, 2)),
             Z = c(rnorm(300, 60, 5), rnorm(150, 60, 5)),
             Label = c(rep('A', 300), rep('B', 150)))
setDT(plot_data)

#Interpolation of Z into X~Y grid for Label A
int_plot_data_A=with(plot_data[Label=="A"],interp(x=X,y=Y,z=Z,nx=100,ny=100))
rownames(int_plot_data_A$z)=int_plot_data_A$x
colnames(int_plot_data_A$z)=int_plot_data_A$y
plot_data_Z_A <- melt(int_plot_data_A$z)
names(plot_data_Z_A) <- c("X", "Y", "Z")

#Calculation of kde2d for Label A
plot_data_A=plot_data[Label=="A"]
kde2d_A=kde2d(plot_data_A$X,plot_data_A$Y,n=100)
rownames(kde2d_A$z)=kde2d_A$x
colnames(kde2d_A$z)=kde2d_A$y
plot_kde_A <- melt(kde2d_A$z, na.rm = TRUE)
names(plot_kde_A) <- c("X", "Y", "Z")

#Interpolation of Z into X~Y grid for Label B
int_plot_data_B=with(plot_data[Label=="B"],interp(x=X,y=Y,z=Z,nx=100,ny=100))
rownames(int_plot_data_B$z)=int_plot_data_B$x
colnames(int_plot_data_B$z)=int_plot_data_B$y
plot_data_Z_B <- melt(int_plot_data_B$z)
names(plot_data_Z_B) <- c("X", "Y", "Z")

#Calculation of kde2d for Label B
plot_data_B=plot_data[Label=="B"]
kde2d_B=kde2d(plot_data_B$X,plot_data_B$Y,n=100)
rownames(kde2d_B$z)=kde2d_B$x
colnames(kde2d_B$z)=kde2d_B$y
plot_kde_B <- melt(kde2d_B$z, na.rm = TRUE)
names(plot_kde_B) <- c("X", "Y", "Z")

#Filtering out values under 0.01. It makes the plot better. This is subjective
setDT(plot_kde_A)
plot_kde_A[Z<0.01]=NA
setDT(plot_kde_B)
plot_kde_B[Z<0.01]=NA

#Calculate for A weighted with Z
plot_kde_A_Weight_Z=plot_kde_A
plot_kde_A_Weight_Z$Z=plot_kde_A_Weight_Z$Z*plot_data_Z_A$Z

#Calculate for B weighted with Z
plot_kde_B_Weight_Z=plot_kde_B
plot_kde_B_Weight_Z$Z=plot_kde_B_Weight_Z$Z*plot_data_Z_B$Z


ggplot() +
  geom_contour_fill(data=plot_kde_A,aes(x=X,y=Y,z=Z),alpha=0.8,bins=10) +
  scale_fill_continuous(low = "white", high = "blue") +
  geom_contour(data=plot_kde_A_Weight_Z,aes(x=X,y=Y,z=Z),bins=10) +
  new_scale_fill() +
  geom_contour_fill(data=plot_kde_B,aes(x=X,y=Y,z=Z),alpha=0.8,bins=10) +
  scale_fill_continuous(low = "white", high = "red") +
  geom_contour(data=plot_kde_B_Weight_Z,aes(x=X,y=Y,z=Z),color="red",bins=10)

Edit 1: After searching around while dropping keyword ggplot, I found something called kernel density estimation And this post, Plot contours of distribution on all three axes in 3D plot, feels like this is what I'm looking for to visualise my data. Unfortunately, I found out that ggplot does not have 3D functionality. There's a 4 years package called gg3D? plotly seems to be the best candidate for this? The final figure in the post looks like what I'm trying to achieve.

0

There are 0 answers