Creating a beautiful heat map with normalized frequencies per column

108 views Asked by At

I am studying the relationship between the concentration of a metabolite (hypocretin) and a continuous outcome variable (BMI). I want to create a frequency heat map in which different concentrations of the metabolite are presented on the X-axis and the frequency of different BMI values on the Y-axis. I have found an ideal picture on the Internet. enter image description here

I want to get a heatmap with the following attributes:

The heat map shows the proportion of values ​in their respective columns, which means that the frequency of each column adds up to 1, instead of the frequency of values in all columns combined.

When I visualize the heat map with our data using ggplot2 this results in the following heat map (corresponding to code1 below).

# Calculate grouped frequencies
data_dt<-as.data.table(BMI_data)
df3<- data_dt[, .(n = .N), keyby = .(Hypocretin.level, BMI)][, freq := prop.table(n), by = "Hypocretin.level"][]
 BMI_dot_plot2 <- df3 %>%
       ggplot(.,aes(x = Hypocretin.level, y = BMI, fill = freq, height = 1,width=1))+
       labs(title = 
                             "Heat Map of the Effect of Hypocretin.Level on Body Mass Index",
                      subtitle =
                            "Where BMI ranges between 0 and 60 with 18.5 - 24.9 being considered healthy (UK National Health Service, 2020)",
                      x="Hypocretin Level/pg/mL(1:0-40;2:40-110;3:110-200;4:200+)", 
                      y="Body Mass Index (BMI)",
                      fill = " Freq")+
       geom_tile(colour="white", size=0) +
    scale_x_continuous(breaks = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18)) +
    scale_y_continuous(breaks = c(15,20,30,40,50)) +
       scale_fill_viridis()+
       mytheme1
  BMI_dot_plot2

heat map with normalization per metabolite concentration but no smoothing: enter image description here

This code however does not have smoothing included, which makes the graph difficult to interpret.

Unfortunately for some of the neurometabolite concentrations I have a lot of data points available (1 and 18) but for the other concentrations (2-17) I have much less data points available. I am however mainly interested in these (sparse) intermediate values and would therefore like to normalize the frequencies per metabolite concentration (in such a way that each concentration [or column] adds to 100% frequency) to better visualize a possible trend.

Then I calculated the frequency by myself,and tried another code:

# Calculate grouped frequencies
data_dt<-as.data.table(BMI_data)
df_BMI<- data_dt[, .(n = .N), keyby = .(Hypocretin.level, BMI)][, freq := prop.table(n), by = "Hypocretin.level"][]
# Matrix
sample_matrix<- acast(df_BMI, Hypocretin.level~BMI, value.var="freq")
sample_matrix[is.na(sample_matrix)] <- 0    # Frequency of blank values changed to 0
sample_matrix
data_BMI <- melt(sample_matrix)            # Convert matrix to data frame

BMI_dot_plot2 <- data_BMI %>%
  ggplot(.,aes(Var1,Var2, height = 1,width = 1)) +
  labs(title = "Heat Map of the Effect of Hypocretin.Level on Body Mass Index",
       subtitle ="Where BMI ranges between 0 and 60 with 18.5 - 24.9 being considered healthy (UK National Health Service, 2020)",
       x="Hypocretin Level/pg/mL(1:0-40;2:40-110;3:110-200;4:200+)",
       y="Body Mass Index (BMI)",
       fill = " Freq")+
  geom_tile(aes(fill = value)) +
  #scale_x_continuous(breaks = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18)) +
  #scale_y_continuous(breaks = c(20,30,40)) +
  scale_fill_viridis()+
  coord_fixed(ratio= 1)+ 
  mytheme1
BMI_dot_plot2

Then I get a plot that more closely matches my ideal plot, but it's just plain ugly.

enter image description here

Since there are more than 300 unique values in the BMI column, but as mentioned above, some hypocretin groups have only 3 values, so there are a large number of bins with a frequency of 0 in the figure, which is really difficult for people to understand.

What code should I use to have the frequencies normalized per neuro metabolite concentration to 100% frequency (like in code2) but also have the data smoothed or beautifaul that it looks like the heat map in the first picture of code 1? Here is the link of rawdata, https://drive.google.com/file/d/1ZhUgrXZ2iRG8LrysMSV19I5LWkJz1OjI/view.

If you have other better visualization methods, please let me know, thank you!

0

There are 0 answers