Multiple color lines in ggparcoord with facet_wrap

815 views Asked by At

I have a data frame with data in the following format:

Month1  Month2  Month3  Month4  Month5  Month6  Month7  Month8  Month9  Month10 Month11 Month12 Month13 Month14 Month15 Type    Subject
2.5617749   2.3900798   2.4261968   3.2463769   2.8622897   2.9429682   3.3301257   2.5712439   2.1379820   2.1297074   1.8171952   1.3065964   0.6729969   0.2342636   0.2643012   Filing 1    Tools of the Trade
2.6787155   3.3005452   3.2765383   3.2594204   3.1994482   2.9489934   3.0170951   2.9648050   2.5933965   2.7525476   2.6949229   2.7816262   2.6125091   2.7238804   2.4219048   Filing 1    Who's at the Door?
1.3769416   1.7417689   1.5411681   1.6315268   1.4034428   2.0020882   1.5563825   1.1329947   1.1466544   1.4037866   1.2279484   1.0863116   1.1081301   0.9657535   0.9496937   ProcessServing 1    Adobe Acrobat
1.5634082   1.9899706   1.8965844   2.0455116   2.0640787   1.8585767   1.4652345   1.5646704   0.9417121   1.5804423   1.3644140   0.8991399   0.8865172   1.4111854   1.1476721   ProcessServing 1    EService

This is just sample data, I have a total of 4 Type and 7 Subject categories. Here's the output of dput(head(avgRevenueBySubject)):

structure(list(Month1 = c(2.32452852540217, 2.39838024319443, 
1.38763119669326, 1.67197010492586, 2.39230240910008, 2.56177491674571
), Month2 = c(2.25983235807464, 2.80008703157276, 1.92684823894878, 
1.81781945992201, 3.11274605464608, 2.39007978845121), Month3 = c(2.45378041585838, 
2.73603115114115, 2.15154625461568, 2.28897180500678, 3.2072070366587, 
2.42619683055328), Month4 = c(2.50950054817085, 2.89118356394795, 
2.19502538520019, 2.28141567102663, 3.0504767706406, 3.24637686954766
), Month5 = c(2.53858195315855, 2.5939498734771, 2.35786859462019, 
2.24828684346212, 3.02313315871281, 2.86228969522596), Month6 = c(2.20551945443653, 
2.11372073519497, 2.24466703665554, 2.17193033864937, 2.70377966653074, 
2.94296818999896), Month7 = c(2.09246043688626, 2.50841794197685, 
2.30673064217475, 1.91220323933604, 2.7369954829105, 3.33012570803583
), Month8 = c(2.22553189078165, 2.44113695766249, 2.26140266497664, 
1.764621178248, 2.62183982786095, 2.57124386952199), Month9 = c(1.99424045532198, 
1.9091795918852, 2.20375474567921, 1.75651288161892, 2.40383936923673, 
2.13798204834703), Month10 = c(2.15229842709522, 2.52246522784505, 
2.01002146553544, 1.74130180371386, 2.53194432666157, 2.12970742947938
), Month11 = c(2.26866642573734, 2.21939880654197, 1.96811894944027, 
1.54314755700399, 2.81563101112808, 1.81719515748861), Month12 = c(2.21540768941806, 
2.09996453939828, 2.14269489036386, 1.69009446249139, 2.52435113546707, 
1.30659644388318), Month13 = c(2.01407795696169, 2.19110438349199, 
2.08594091270487, 1.66310710284536, 2.30479375587374, 0.672996949673077
), Month14 = c(1.85702016208139, 2.18375170870693, 2.28394628775105, 
1.64612604028705, 2.51616863736761, 0.234263615828042), Month15 = c(1.7562791061015, 
2.38349140169948, 1.96156382849473, 1.78529825283472, 2.36734279344632, 
0.264301216598792), Type = structure(c(2L, 2L, 2L, 2L, 2L, 2L
), .Label = c("eServices 1", "Filing 1", "ProcessServing 1", 
"Research 1"), class = "factor"), Subject = c("Adobe Acrobat", 
"EService", "OCeFiling", "SD eFiling", "Saving Trees & Time", 
"Tools of the Trade")), .Names = c("Month1", "Month2", "Month3", 
"Month4", "Month5", "Month6", "Month7", "Month8", "Month9", "Month10", 
"Month11", "Month12", "Month13", "Month14", "Month15", "Type", 
"Subject"), row.names = c(NA, 6L), class = "data.frame")

I'm trying to plot this information using the following code:

q <- ggparcoord(data = avgRevenueBySubject,
                columns = 1:15, 
                groupColumn = 17, 
                showPoints = FALSE, 
                alphaLines = 0.3,
                shadeBox = NULL,
                scale = "globalminmax",
                title = "Average Revenue by Training Subject"
)  +
  geom_vline(aes(xintercept=3.5),color='blue',linetype="dashed", size=1) +
  facet_wrap(~Subject,scales='fixed', nrow = 4) + geom_line(size=1)
q <- q + theme_minimal() + xlab('Months') + ylab('Average Revenue (on log scale)') +
  theme(legend.position = "none") + theme(axis.text.y = element_text(hjust=0, angle=0), 
                                          axis.text.x = element_text(hjust=1, angle=45),
                                          plot.title = element_text(size=20))
q

and I get the following plot:

enter image description here

Now as we can see that I'm getting different color in each facet and the same color in every individual plot.

I would like to see different colors for the 4 lines on each individual plot and the colors of these lines to be the same across individual facet.

Any help would be much appreciated.

1

There are 1 answers

0
MrFlick On BEST ANSWER

As far as I can tell, ggparcoord drops columns from the data set it does not use. So if you want to use a variable in the facet that you did not reference in ggparcoord(), then you're going to have a problem.

One work around is to modify the data in in the ggplot object directly. Normally i'd say this is a bad idea but right now I don't see any other way.

q<-ggparcoord(data = avgRevenueBySubject,
                columns = 1:15,  
                showPoints = FALSE, 
                alphaLines = 0.3,
                groupColumn="Type",
                shadeBox = NULL,
                scale = "globalminmax",
                title = "Average Revenue by Training Subject"
)
# data to merge
mm <- cbind.data.frame(.ID=1:nrow(avgRevenueBySubject), Subject=avgRevenueBySubject$Subject)
#merge data
q$data<-merge(q$data, mm)
#finish plot commands
q <- q+ geom_vline(aes(xintercept=3.5),color='blue',linetype="dashed", size=1) +
    facet_wrap(~Subject,scales='fixed', nrow = 4) + geom_line(size=1)
q <- q + theme_minimal() + xlab('Months') + ylab('Average Revenue (on log scale)') +
      theme(legend.position = "none") + theme(axis.text.y = element_text(hjust=0, angle=0), 
          axis.text.x = element_text(hjust=1, angle=45),
          plot.title = element_text(size=20))