The correct way to remove or update Item

901 views Asked by At

I am building recommendation system for classified ads website , ads are added and deleted daily.

What I thought of is to use PutItems to add new ads and make field called status = 0 , if user deleted the ad , I will use the same PutItem API with the same ITEM_ID to update the stored Item, and use filter to select only ads with status = 0 when generation recommendation.

Is that correct ? will the PutItems API update the existing ad ? and is there anyway to delete the Item ?

1

There are 1 answers

3
PatrykMilewski On

Currently there is no way to remove items that were already added to Datasets.

Your workaround looks good, however from my experience with working with Personalize, the filter might decrease your recommendations quality.

To understand why, this is the more or less algorithm, that Personalize uses for filtering recommendations:

  • Get recommended items for user
  • Filter recommendations using filter expression
  • Return first N recommended items left after filtering

Because the filtering is done after getting recommendations, it means, that Personalize will simply fill recommendations list with items, that were somewhere down on the recommended list.

And there is a problem with that approach - items lower on the list, have lower "Score" value, which indicates accuracy of recommendations. That's why you will end up with in general worse recommendations, but it will depend how many ads that have status = 0 were recommended, before filtering out them.

To check your recommendations scores, simply get recommendations in Personalize web UI. It will return list of recs with scores.

Better approach

If your ads are updated daily, then you can definitely workaround it by following those steps:

  • Create a Lambda function, that is triggered every 24 hours
  • Lambda will fetch all of the ads and put them into S3 bucket as CSV file. It should exclude ads that are no longer available (status = 0)
  • Call CreateDatasetImportJob API using any AWS SDK of your choice and provide the data which is stored on S3 bucket
  • Personalize will start import job. When it finishes, all of the items are replaced with the newest dump

However it has some downsides.

If you are not using the User-Personalization (aws-user-personalization) Recipe, then after each import of Items, you need to update your Solution by creating new Solution Version. Otherwise it won't include changes made by items dataset import job.

Creating a new Solution Version is quite slow and expensive, that's why I would recommend to use User-Personalization Recipe, if you want to use this approach and since HRNN Recipes are marked as legacy, it's a good idea to migrate anyways.

If you are using User-Personalization Recipe, then according to AWS documentation:

Amazon Personalize automatically updates your latest solution version every two hours to include new data. Your campaign automatically uses the updated solution version. For more information see Automatic Updates.

So pretty much all of the work is done on Personalize side and you don't have to worry about Solution retraining after each Items import job.

And the last problem...

Since for User-Personalization Recipe documentation claims, that your solution will be updated within two hours, then you might end up with recommending items, that are not available, for some short period of time. If you are updating items daily, it might be a significant problem.

To fix that case, I would recommend simply using Filter approach, that you mentioned. Thanks to this, you have benefits of both approaches and your recommendations are always valid.