Error in retraining solution for batch inference jobs

71 views Asked by At

Updates: We got in contact with AWS Personalize. The error is that the model's size is greater than 13GB and when it is being written into S3 it gives an error regarding the upload size. This issue is from their end and they're working on a solution without any ETA. If anyone knows of any solutions we can implement from our end please share.

I have created a solution using the aws-user-personalization recipe which I use with only batch inference jobs.

Upon creating a new solution version with 'UPDATE' I am getting an error:

InternalServerError: We encountered an internal error. Please try again.

My items dataset contains over 1.5 million items. I am aware that it only trains using 750k items. But is it possible that because I have so many items and new items coming in from PutItem API that the new version is throwing this error? If yes then how can it be resolved?

Additionally, if I am using a solution for batch inference jobs, do I have to retrain it manually to consider new events and items from the PutEvent/Item API? Does it not get automatically retrained?

1

There are 1 answers

5
James J On

I recommend creating a ticket with AWS support to investigate the root cause of the InternalServerError.

Having more than 750K items in your items and/or interactions dataset will not cause an error.

Additionally, if I am using a solution for batch inference jobs, do I have to retrain it manually to consider new events and items from the PutEvent/Item API?

No, not with the user-personalization recipe. New items and new interactions added since the last full retraining and considered by batch inference jobs.

https://aws.amazon.com/about-aws/whats-new/2023/07/amazon-personalize-quality-batch-recommendations/

Does it not get automatically retrained?

The update before batch inference runs is not retraining. You should still occasionally retrain the model (i.e., create a solution version with trainingMode=FULL).