R: Pruning data.tree without altering

720 views Asked by At

In the data.tree package when one prunes a tree, it permanently alters the tree. This is problematic, as my data.tree takes a long time to generate, and I don't want to generate a new one everytime I have to do a new pruning.

Here I generate a data.tree

# Loading data and library
library(data.tree)
data(acme)

# Function to add cumulative costs to all nodes
Cost <- function(node) {
  result <- node$cost
  if(length(result) == 0) result <- sum(sapply(node$children, Cost))
  return (result)
}

# Adding costs and other data for my example
acme$Do(function(node) node$cost <- Cost(node), filterFun = isNotLeaf)
acme$IT$Outsource$AddChild("Tester Inc")
acme$IT$Outsource$`Tester Inc`$cost <- 10
print(acme, "p", "cost")
                          levelName    p    cost
1  Acme Inc.                          NA 4950000
2   ¦--Accounting                     NA 1500000
3   ¦   ¦--New Software             0.50 1000000
4   ¦   °--New Accounting Standards 0.75  500000
5   ¦--Research                       NA 2750000
6   ¦   ¦--New Product Line         0.25 2000000
7   ¦   °--New Labs                 0.90  750000
8   °--IT                             NA  700000
9       ¦--Outsource                0.20  400000
10      ¦   °--Tester Inc             NA      10
11      ¦--Go agile                 0.05  250000
12      °--Switch to R              1.00   50000

Here I prune the tree.

# Pruner function
Pruner <- function(node) {
  cost <- node$cost
  cost_parent <- node$parent$cost
  if(cost < 2800000 & cost_parent > 2800000) {
    return(TRUE)
  } else {
    return(FALSE)
  }
}

# Pruning the tree
Prune(acme, function(node) Pruner(node))
print(acme, "p", "cost")
       levelName  p    cost
1 Acme Inc.      NA 4950000
2  ¦--Accounting NA 1500000
3  ¦--Research   NA 2750000
4  °--IT         NA  700000

I have tried to save my data.tree object in several ways, but they all end up generating HUGE files or by taking longer than it would to generate a new tree from the scratch.

# Saving object
save(acme, file = "acme.RData")
saveRDS(acme, "acme.rds")

# Generating a clone
acme_clone <- Clone(acme)

My next intuition was to see if I could just temporarily prune the tree using the Get function, as the data.tree documentation states that There are two variations of this: temporary pruning, e.g. just for printing: This is the pruneFun parameter, e.g. in Get side effect or permanent pruning, meaning that you modify your data.tree structure for good. This is achieved with the Prune method.

It was not clear how to make this work as there were no examples.

1

There are 1 answers

1
Esben Eickhardt On BEST ANSWER

After some fiddeling around I finally tried the following, and made it work. There are no good examples, so I thought I would leave one here.

print(acme, "cost", pruneFun = function(node) Pruner(node)) 
       levelName    cost
1 Acme Inc.      4950000
2  ¦--Accounting 1500000
3  ¦--Research   2750000
4  °--IT          700000