Really don't know where to start seeking for the right algorithm.

I'm building a web application that collects schema.org data from different webshops as Amazon, Shopify, etc. It collects data every 6h and shows the current and lowest price. It is used for monitoring products and buying at the lowest price.

My goal is to recognize products from different shops as the same product. Every shop has its own title for the same product.

Example:

Google Pixel 2 64GB Clearly White (Unlocked) Smartphone 
Google Pixel 2 GSM/CDMA Google Unlocked (Clearly White, 64GB, US warranty) 

Problems:

  1. don't have a lot of data (only products chosen by the user)
  2. needs to support every new product that app doesn't have data history

1 Answers

0
Siddhant Tandon On Best Solutions

Might not be the best solution but perhaps you can try a recommender system ? More specifically you can try Item-Item Content-based recommendation system. The idea is to extract features from items itself ( items in your case meaning product description ). Item profiles are built which are features for an item which might be tf-idf weight or simply a frequency weighting scheme. After building these features for every item you want to find most similar items to a given item. This can be done using a some similarity measure like cosine-distance or jaccard distance. The items returned with highest similarity score would mean most similar items. Probably the top one will be the same product as the given input product.

Before trying out above approach just simply use a cosine distance for all item-item pairs, by providing two product titles as arguments. Read this answer