flatten a unbalanced(ragged) hierarchy

Question

flatten a unbalanced(ragged) hierarchy

82 views Asked by gcbm1984 At 13 January 2024 at 23:17

So I have a .csv file that displays a ragged hierarchy vertically. The indent_nbr indicates the levels each item is at in the hierarchy, with 0 being the top parent.

   item indent_nbr
1     A          0
2     B          1
3     C          2
4     D          3
5     E          4
6     F          4
7     G          4
8     H          5
9     I          5
10    J          5
11    K          5
12    L          4
13    M          5
14    N          5
15    O          5
16    P          5
17    Q          3
18    R          4

I want to flatten this hierarchy to look like this matrix.

      [,1] [,2] [,3] [,4] [,5] [,6]
 [1,] "A"  "B"  "C"  "D"  "E"  NA  
 [2,] "A"  "B"  "C"  "D"  "F"  NA  
 [3,] "A"  "B"  "C"  "D"  "G"  "H" 
 [4,] "A"  "B"  "C"  "D"  "G"  "I" 
 [5,] "A"  "B"  "C"  "D"  "G"  "J" 
 [6,] "A"  "B"  "C"  "D"  "G"  "K" 
 [7,] "A"  "B"  "C"  "D"  "L"  "M" 
 [8,] "A"  "B"  "C"  "D"  "L"  "N" 
 [9,] "A"  "B"  "C"  "D"  "L"  "O" 
[10,] "A"  "B"  "C"  "D"  "L"  "P" 
[11,] "A"  "B"  "C"  "Q"  "R"  NA

Can someone help me with this?

please note that I'm limited to using the following packages: base, boot, class, cluster, codetools, compiler datasets, foreign, graphics, grDevices, grid, Kernsmooth, lattice, MASS, Matrix, methods, mgcv, nlme, nnet, parallel, rpart, spatial, splines, stats, stats4, survival, tcltk, tools, translations, utils

Original Q&A

There are 1 answers

**jay.sf** · Answer 1 · 2024-01-14T08:36:23+00:00

To turn hierarchical data into a matrix, we could first make groups g according to where the hierarchy changes. Then, we create an array a sized according to these groups and levels. Next, we place the items in this array based on where they fall in the hierarchy. This way we get the first complete sequence and end points of the following, thus leaving NAs which can be filled column-wise with last non-NA using Ruben's repeat_last, so no extra packages are needed. However this overwrites true NAs which we store beforehand in na_ind and recover afterwards.

> hrr2mat <- \(dat) {
+   g <- c(0, cumsum(diff(dat$indent_nbr) != 1))
+   a <- array(dim=c(length(table(g)), length(table(dat$indent_nbr))))
+   a[cbind(g + 1, dat$indent_nbr + 1)] <- dat$item
+   na <- apply(!is.na(a), 1, \(x) max(cumsum(diff(x) >= 0) + 1)) + 1
+   w <- which(na <= ncol(a))
+   na_ind <- t(mapply(cbind, w, lapply(na[w], `:`, ncol(a))))
+   a <- apply(a, 2, repeat_last)
+   a[na_ind] <- NA
+   return(a)
+ }
> hrr2mat(dat)
      [,1] [,2] [,3] [,4] [,5] [,6]
 [1,] "A"  "B"  "C"  "D"  "E"  NA  
 [2,] "A"  "B"  "C"  "D"  "F"  NA  
 [3,] "A"  "B"  "C"  "D"  "G"  "H" 
 [4,] "A"  "B"  "C"  "D"  "G"  "I" 
 [5,] "A"  "B"  "C"  "D"  "G"  "J" 
 [6,] "A"  "B"  "C"  "D"  "G"  "K" 
 [7,] "A"  "B"  "C"  "D"  "L"  "M" 
 [8,] "A"  "B"  "C"  "D"  "L"  "N" 
 [9,] "A"  "B"  "C"  "D"  "L"  "O" 
[10,] "A"  "B"  "C"  "D"  "L"  "P" 
[11,] "A"  "B"  "C"  "Q"  "R"  NA

Not sure how it scales but might be a start.

Data:

> dput(dat)
structure(list(item = c("A", "B", "C", "D", "E", "F", "G", "H", 
"I", "J", "K", "L", "M", "N", "O", "P", "Q", "R"), indent_nbr = c(0, 
1, 2, 3, 4, 4, 4, 5, 5, 5, 5, 4, 5, 5, 5, 5, 3, 4)), class = "data.frame", row.names = c(NA, 
-18L))

TechQA.

flatten a unbalanced(ragged) hierarchy

There are 1 answers

Related Questions in R

Related Questions in PIVOT

Related Questions in HIERARCHY

Related Questions in FLATTEN

Related Questions in RAGGED

Popular Questions

Trending Questions