R: clusters in histogram

3.8k views Asked by At

I've got 6000 reports. For each report, I've got how many garbage there is in the report. So I can make a histogram of this:

boundaries = seq(0,1 , by=0.01) 
hist(hoeveel_rommel_per_rapport, breaks=boundaries)

where hoeveel_rommel_per_rapport is a vector that describes the garbage for each reports. enter image description here

Now I've got a cluster number for each report. I want to give all the clusters a different color in the histogram. Is this possible?

So for example, the first stack contains 3 different clusters, so it gets 3 colors.

2

There are 2 answers

0
jlhoward On

I'd be inclined to use ggplot for something like this. Here are some approaches using made up data (in future, you should provide your data, or at least a representative sample).

set.seed(1)   # for reproducible example
reports <- data.frame(garbage=rchisq(900,c(10,15,20))/50,cluster=LETTERS[1:3])

This is seems like what you were looking for - a stacked histogram.

library(ggplot2)
ggplot(reports) +
  geom_histogram(aes(x=garbage, fill=cluster),binwidth=0.01)

This puts the three histograms in different panels - much clearer.

ggplot(reports) +
  geom_histogram(aes(x=garbage, fill=cluster),binwidth=0.01) +
  facet_wrap(~cluster,nc=1)

Overlapping density plots in one panel.

ggplot(reports) +
  stat_density(aes(x=garbage, fill=cluster),position="identity",alpha=0.5)

4
llrs On

You can pass the argument col with the colors you want, I am not sure if by passing a list of colors like col = c("green", "red", "blue") will do what you want, but you can certainly select which color do they have