Variable parameterised over a trait not a struct?

696 views Asked by At

I'm trying to wrap my head around Rust's generics. I'm writing something to extract HTML from different web sites. What I want is something like this:

trait CanGetTitle {
    fn get_title(&self) -> String;
}

struct Spider<T: CanGetTitle> {
    pub parser: T
}

struct GoogleParser;
impl CanGetTitle for GoogleParser {
    fn get_title(&self) -> String {
        "title from H1".to_string().clone()
    }
}

struct YahooParser;
impl CanGetTitle for YahooParser {
    fn get_title(&self) -> String {
        "title from H2".to_string().clone()
    }
}

enum SiteName {
    Google,
    Yahoo,
}

impl SiteName {
    fn from_url(url: &str) -> SiteName {
        SiteName::Google
    }
}

fn main() {
    let url = "http://www.google.com";
    let site_name = SiteName::from_url(&url);
    let spider: Spider<_> = match site_name {
        Google => Spider { parser: GoogleParser },
        Yahoo => Spider { parser: YahooParser }
    };

    spider.parser.get_title();    // fails
}

I'm getting an error about the match returning Spiders parameterised over two different types. It expects it to return Spider<GoogleParser> because that's the return type of the first arm of the pattern match.

How can I declare that spider should be any Spider<T: CanGetTitle>?

2

There are 2 answers

4
Shepmaster On BEST ANSWER

How can I declare that spider should be any Spider<T: CanGetTitle>?

You cannot. Simply put, the compiler would have no idea how much space to allocate to store spider on the stack.

Instead, you will want to use a trait object: Box<CanGetTitle>:

impl<T: ?Sized> CanGetTitle for Box<T>
where
    T: CanGetTitle,
{
    fn get_title(&self) -> String {
        (**self).get_title()
    }
}

fn main() {
    let innards: Box<CanGetTitle> = match SiteName::Google {
        SiteName::Google => Box::new(GoogleParser),
        SiteName::Yahoo => Box::new(YahooParser),
    };
    let spider = Spider { parser: innards };
}
3
Peter Hall On

How can I declare that spider should be any Spider<T: CanGetTitle>?

Just to add a little to what @Shepmaster already said, spider cannot be any Spider<T>, because it has to be exactly one Spider<T>. Rust implements generics using monomorphization (explained here) which means it compiles a separate version of your polymorphic function for each concrete type that is used. If the compiler cannot deduce a unique T for a particular call site then it's a compile error. In your case, the compiler deduced that the type must be Spider<Google>, but then the next line tries to treat it as Spider<Yahoo>.

Using a trait object lets you defer all of that to runtime. By storing the actual object on the heap and using a Box, the compiler knows how much space needs to be stack allocated (just the size of a Box). But this comes with performance costs: there is extra pointer indirection when the data needs to be accessed and, more significantly, the optimising compiler cannot inline virtual calls.

It is often possible to rejig things so you can work with a monomorphic type anyway. One way to do that in your case is to avoid the temporary assignment to a polymorphic variable, and use the value only at a place where you know its concrete type:

fn do_stuff<T: CanGetTitle>(spider: Spider<T>) {
    println!("{:?}", spider.parser.get_title());
}

fn main() {
    let url = "http://www.google.com";
    let site_name = SiteName::from_url(&url);
    match site_name {
        SiteName::Google => do_stuff(Spider { parser: GoogleParser }),
        SiteName::Yahoo => do_stuff(Spider { parser: YahooParser })
    };
}

Notice that each time do_stuff is called, T resolves to a different type. You only write one implementation of do_stuff, but the compiler monomorphizes it twice - once for each type that you called it with.

If you use a Box then each call to parser.get_title() will have to be looked up in the Box's vtable. But this version will usually be faster by avoiding the need for that lookup, and allowing the compiler the possibility of inlining the body of parser.get_title() in each case.