So I have a .csv file that displays a ragged hierarchy vertically. The indent_nbr indicates the levels each item is at in the hierarchy, with 0 being the top parent.
item indent_nbr
1 A 0
2 B 1
3 C 2
4 D 3
5 E 4
6 F 4
7 G 4
8 H 5
9 I 5
10 J 5
11 K 5
12 L 4
13 M 5
14 N 5
15 O 5
16 P 5
17 Q 3
18 R 4
I want to flatten this hierarchy to look like this matrix.
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "A" "B" "C" "D" "E" NA
[2,] "A" "B" "C" "D" "F" NA
[3,] "A" "B" "C" "D" "G" "H"
[4,] "A" "B" "C" "D" "G" "I"
[5,] "A" "B" "C" "D" "G" "J"
[6,] "A" "B" "C" "D" "G" "K"
[7,] "A" "B" "C" "D" "L" "M"
[8,] "A" "B" "C" "D" "L" "N"
[9,] "A" "B" "C" "D" "L" "O"
[10,] "A" "B" "C" "D" "L" "P"
[11,] "A" "B" "C" "Q" "R" NA
Can someone help me with this?
please note that I'm limited to using the following packages: base, boot, class, cluster, codetools, compiler datasets, foreign, graphics, grDevices, grid, Kernsmooth, lattice, MASS, Matrix, methods, mgcv, nlme, nnet, parallel, rpart, spatial, splines, stats, stats4, survival, tcltk, tools, translations, utils
To turn hierarchical data into a matrix, we could first make groups
gaccording to where the hierarchy changes. Then, we create an arrayasized according to these groups and levels. Next, we place the items in this array based on where they fall in the hierarchy. This way we get the first complete sequence and end points of the following, thus leaving NAs which can be filled column-wise with last non-NA using Ruben'srepeat_last, so no extra packages are needed. However this overwrites true NAs which we store beforehand inna_indand recover afterwards.Not sure how it scales but might be a start.
Data: