I need a little advice with deploying Triton inference server with explicit model control. From the looks of it, this mode gives the user the most control to which model goes live. But the problem I’m not able to solve is how to load models in case the server goes down in production which triggers a new instance to spawn up.
The only solution I can think of is to have a service poll the server at regular time intervals, constantly check if my live models are actually live and if not, load them. But this seems like quite a complicated process.
I would like to know how others have solved this problem.
Thanks in advance