I am using AWS Personalize for a recommendation system. My use case includes adding and deleting items from a catalog daily. I use the PutItem API to constantly add and logically delete items. However, I have read that this can cause the quality of your recommendations to deplete over time. Which makes sense since as time passes, the number of items that are deleted will majorly surpass the active items which will be considered in training.
If I do a batch import job without the deleted items, it does not replace the items added via the API, so the deleted items still remain.
Is there any way to delete items that have been added via the PutItem API? Or any other tricks to delete items in AWS Personalize?
The method I tried was:
- Logically delete the item via the PutItem
- Do a full dataset export job and remove all deleted items
- Do a full import job which contains non-deleted items, replacing the past items
- Retrain my solutions
I expected this to remove all the deleted items, however at part 3, the import job, it does not replace the items added via PutItem API.
The only way to delete items from the items dataset that were added via PutItems is to delete the items dataset, recreate the items dataset, and then reimport active items. This can be problematic, though, for an application in production that is using a Personalize campaign or recommender and filter(s) since there would be downtime. Note that the bulk dataset import job only replaces the previous bulk import data; it does not replace any items added via PutItems. That is why the entire dataset has to be deleted to remove items. With that said, deleting items from the items dataset does not delete their interactions from the interactions dataset. Therefore, the behavioral data will still exist and they will be considered during training.
The recommended approach to use for an item catalog that changes often is to use a field in the items dataset that indicates if an item is available or not and then use a filter when retrieving recommendations to only include items that are available. For example, an
Items.IS_AVAILABLEcolumn that has a value ofYesorNo. At inference, a filter with an expression like the following can be used to only include available items.Then when you need to "delete" an item, call PutItems for the item setting its
IS_AVAILABLEvalue toNo. The filter will reflect this change within 15 minutes.