I'm parsing a command string and creating an abstract syntax tree (AST) composed of nodes of different types. I'm trying to figure out an efficient way to execute the command.
This app is extremely performance sensitive. It's for a query engine that will need to process millions of items while the user waits.
The standard way to do this in a non-Rust language is to have each node type implement a common interface. To do this in Rust with a trait, though, you have to use awkward Box syntax, and that requires dynamic dispatch on every call. For example,
trait Node {
fn get(&mut self) -> u32;
}
struct MyParentNode {
my_int: u32,
left: Box<dyn Node>,
right: Box<dyn Node>,
}
impl Node for MyParentNode {
fn get(&mut self) -> u32 {
self.left.get() + self.right.get()
}
}
struct MyLeafNode {
my_int: u32,
}
impl Node for MyLeafNode {
fn get(&mut self) -> u32 {
self.my_int
}
}
I have seen benchmarks that suggest that dynamic dispatch is very slow compared to calling a concrete function.
One alternative, not much better, is to use enums as node types:
enum NodeEnum {
Parent(Box<ParentInfo>),
Leaf(Box<LeafInfo>),
}
impl NodeEnum {
fn get(&mut self) -> u32 {
match self {
NodeEnum::Parent(parent) => parent.get(),
NodeEnum::Leaf(leaf) => leaf.get(),
}
}
}
struct ParentInfo {
left: NodeEnum,
right: NodeEnum,
}
impl ParentInfo {
fn get(&mut self) -> u32 {
self.left.get() + self.right.get()
}
}
struct LeafInfo {
my_int: u32,
}
impl LeafInfo {
fn get(&mut self) -> u32 {
self.my_int
}
}
(The code above may not seem terrible, but in the real app there will be a couple dozen node types and a dozen methods on each node, so many calls to match).
There has to be a better way. Is there any way to implement this without the overhead of dynamic dispatch or having to call match on every call to get()?
Can function pointers help?
Please find below, a quick and dirty example around your question.
I'm aware the way I did the timing (in release mode, however, and with the help from
cargo flamegraph) is not rigorous enough (some tooling exists for that purpose); this gives just a coarse grained feeling.Two solutions are tested. The first one uses some dynamic dispatch. The second one uses the
enum_dispatchcrate in order to replace the virtual table by some (automated)matchstatements. There is a substantial gain in performances (4.7 vs 2.0 seconds in this specific case and on my computer).Note that if the nodes in your actual use case do much more than the simple additions of this example, the dispatch cost may not be significant. In the end, only timing your actual use case could tell us if all of this is beneficial.
N.B.: in my first answer, I considered the creation of the nodes in the timings, because I don't know the use case. Obviously, much of the time was spent in allocating/freeing, and I tested a third version in order to mitigate that. On the other hand, if we consider that the nodes are built once for all and used many times (as in this new answer), then the dispatch cost becomes significant.