I have DF as below. it contains information about two students for 3 terms and subjects whether they pass or fail.
I would like to draw a parallel coordinate of the trace of the students. I wanna see which path is taken to reach the end.
ID term subject result
1 1 math01 fail
1 1 Phys01 pass
1 1 chem01 pass
1 2 math01 pass
1 2 math02 fail
1 3 math02 fail
1 3 cmp01 pass
2 1 math01 fail
2 1 phys01 pass
2 2 math01 pass
2 2 math02 pass
2 3 cmp01 pass
the desired result would be similar to the image below.
Each block at each term shows the taken subject alias with the result
column (fail or pass). the size of the block should correspond to the number of the taken subject. for example, if most students fail math01 at term 1, the block of math01fail should be the biggest block in below term1.
The connecting line connects that what subjects the students took in the term with the next term. The thickness of the line corresponds to the number of connections at that point. for example, if many students fail math01 (math01fail) at term1 and retake math01 at term2 and pass it (math01pass), the connecting line between math01fail to math01pass should thicker regarding the number of occurrences.
I think you're better off approaching this problem from a graph point of view, rather than in the context of parallel coordinates.
Here is what I would do:
Start by loading necessary libraries
First we define the edge list of our graph. To do so, we do a self-join of
df
byID
, and select rows that correspond to consecutive (increasing) terms. Every row then corresponds to a link from term i to i+1 for every student.We add a weight column that is proportional to the number of students for every edge. The reason why we don't simply do
weight = n()
is purely due to aesthetics, where we like to have thicker lines for >1 students.Next we define a node list. The key here is to add columns
x
andy
which will determine the grid layout of the nodes.Note that entries in the first column of
nl
need match those in the first two columns ofel
.We are now ready to construct an
igraph
from bothdata.frame
s and plot the graph.The resulting plot may need some further tweaking/polishing, but this should get you started.