Introduction
After reading Ruby Outperforms C article I got curious how Rust would perform. So I started working on a Ruby gem with Rust extension that implements a GraphQL parser. For the parser implementation I'm using graphql_parser crate. To interact with Ruby runtime I'm using magnus following this guide.
My approach
I wrote some Rust code that traverses the AST produced by graphql_parser and translates it to Ruby objects (hashes and arrays). The code looks like this:
fn parse(query: String) -> Result<RHash, Error> {
    match parse_query::<String>(&query) {
        Ok(r) => return Ok(translation::translate_document(&r)),
        Err(e) => return Err(Error::new(exception::runtime_error(), e.to_string())),
    }
}
fn translate_document(doc: &Document<'_, String>) -> RHash {
    let hash = build_ruby_node("document");
    let definitions = RArray::new();
    for x in doc.definitions.iter() {
        definitions.push(translate_definition(x)).unwrap();
    }
    hash.aset(Symbol::new("definitions"), definitions).unwrap();
    return hash;
}
fn translate_definition(definition: &Definition<'_, String>) -> RHash {
    return match definition {
        Definition::Operation(operation) => translate_operation_definition(operation),
        Definition::Fragment(fragment) => translate_fragment_definition(fragment),
    };
}
// Many more functions that follow the structure produced by graphql_parser...
fn build_ruby_node(node_type: &str) -> RHash {
    let hash = RHash::new();
    hash.aset(Symbol::new("node_type"), Symbol::new(node_type))
        .unwrap();
    return hash;
}
Essentially I have a bunch of recursive functions that closely mirror the document structure defined by graphql_parser. Full code is in the GitHub repo.
Benchmark
I was hoping the described approach would have a performance similar to the following:
fn parse_raw(query: String) -> String {
    let ast = parse_query::<&str>(&query);
    return format!("#{:?}",ast);
}
My idea was that this does essentially the same thing: traverses the AST and produces a Ruby object with the same data. However the benchmark shows that my code is actually ~5x slower:
$ bundle exec ruby benchmark.rb
Warming up --------------------------------------
               parse     4.000  i/100ms
           parse_raw    24.000  i/100ms
Calculating -------------------------------------
               parse     53.968  (± 3.7%) i/s -    272.000  in   5.050661s
           parse_raw    245.115  (± 2.4%) i/s -      1.248k in   5.094948s
I ran this with Ruby 3.2.2 (ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]).
Question
I'm a Rust beginner. I'd like to understand where the performance bottleneck in my code is. Am I doing something obviously wrong? Is allocating Ruby objects just inherently slow? Is there a way I can make this gem more performant?