Huggingface's use of a mixin keeps teasing me that this should be possible, but I can't find any clear documentation on exactly what the requirements are, or if the dependencies are just too much to make it worth it. The central module is literally thousands and thousands of lines, and I felt from studying it yesterday that I've learnt more about how to write beam search than I have about GenerationMixin. :-)
From reading the source I think the dependencies are self.config then prepare_inputs_for_generation() and _update_model_kwargs_for_generation(); also implicitly forward(). But I'm not sure that is everything. Nor what each should look like. And I think it may expect forward() to return data in a specific format.
To make the discussion specific, and generally useful, how could Huggingface's beam search be used with minGPT, which has a forward() function that returns logits,loss. (It actually has its own generate() function that does the equivalent of Huggingface's sample() and greedy_search(), but no beam search support.) Or nanoGPT if you prefer - they are identical in this area.
In the comments I said It seems everyone's generate/beam search implementation is tied in closely with their transformer implementation... and I still can't really see why everyone reinvents this wheel, and why there is no standalone open source beam search implementation, with a clearly defined interface. Going to throw a bounty at this question, to see if it helps.
If you want to use huggingface code, what you're looking for is
generatefromGenerationMixinclass, see hereSo your options are either adapt the code to inherit from GenerationMixin, or copy the code over. Either way it depends on your model being huggingface-friendly so juts plugging in a random one without adjusting the code won't work.
If you don't necessarily want to use hface code, there's a bunch of very convenient implementations on github that are easier to adapt, for example here