I've created a SageMaker model for a Seq2Seq neural network, and then started a SageMaker endpoint:
create_endpoint_config_response = sage.create_endpoint_config(
EndpointConfigName = endpoint_config_name,
ProductionVariants=[{
'InstanceType':'ml.m4.xlarge',
'InitialInstanceCount':1,
'ModelName':model_name,
'VariantName':'AllTraffic'}])
create_endpoint_response = sage.create_endpoint(
EndpointName=endpoint_name,
EndpointConfigName=endpoint_config_name)
This standard endpoint does not support beam search. What is the best approach for creating a SageMaker endpoint that supports beam search?
Based on your comment I believe the only solution is to create your own docker container for inference. This way you can load your already trained model and do with it whatever you like. This example is a good place to start when you want to learn about how to use docker in sagemaker.
For your use case it would be best to find the source code for the sagemaker builtin seq2seq model (the builtin algorithms are also just docker images), modify it to your needs, built the modified docker container and load it to your aws ecr, from where you can load it with sagemaker.
Unfortunately I don't know if the source code for the docker containers is publicly available (didn't find it on the first try).