Plotting each column of a dataframe as one line using ggplot

9.6k views Asked by At

The whole dataset describes a module (or cluster if you prefer).

In order to reproduce the example, the dataset is available at: https://www.dropbox.com/s/y1905suwnlib510/example_dataset.txt?dl=0

(54kb file)

You can read as:

test_example <- read.table(file='example_dataset.txt')

What I would like to have in my plot is this

On the plot, the x-axis is my Timepoints column, and the y-axis are the columns on the dataset, except for the last 3 columns. Then I used facet_wrap() to group by the ConditionID column.

This is exactly what I want, but the way I achieved this was with the following code:

plot <- ggplot(dataset, aes(x=Timepoints))
plot <- plot + geom_line(aes(y=dataset[,1],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,2],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,3],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,4],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,5],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,6],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,7],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,8],colour = dataset$InModule))
...

As you can see it is not very automated. I thought about putting in a loop, like

columns <- dim(dataset)[2] - 3
for (i in seq(1:columns))
{
  plot <- plot + geom_line(aes(y=dataset[,i],colour = dataset$InModule))
}
(plot <- plot + facet_wrap(  ~ ConditionID, ncol=6) )

That doesn't work. I found this topic Use for loop to plot multiple lines in single plot with ggplot2 which corresponds to my problem. I tried the solution given with the melt() function.

The problem is that when I use melt on my dataset, I lose information of the Timepoints column to plot as my x-axis. This is how I did:

data_melted <- dataset
as.character(data_melted$Timepoints)
dataset_melted <- melt(data_melted)

I tried using aggregate

aggdata <-aggregate(dataset, by=list(dataset$ConditionID), FUN=length)

Now with aggdata at least I have the information on how many Timepoints for each ConditionID I have, but I don't know how to proceed from here and combine this on ggplot.

Can anyone suggest me an approach. I know I could use the ugly solution of creating new datasets on a loop with rbind(also given in that link), but I don't wanna do that, as it sounds really inefficient. I want to learn the right way.

Thanks

1

There are 1 answers

1
shadow On BEST ANSWER

You have to specify id.vars in your call to melt.data.frame to keep all information you need. In the call to ggplot you then need to specify the correct grouping variable to get the same result as before. Here's a possible solution:

melted <- melt(dataset, id.vars=c("Timepoints", "InModule", "ConditionID"))
p <- ggplot(melted, aes(Timepoints, value, color = InModule)) +
  geom_line(aes(group=paste0(variable, InModule)))
p