Our users will give a 2 to 3 sentence description about their profession.
Example user A (profile description): I am a data scientist living in Berlin, I like Japanese food and I am also interested in arts.
Then they also give a description about what kind of person they are looking for.
Example user B (looking for description): I am looking for a data scientist, sales guy and an architect for my new home
.
We want to match these on the basis that user A is a data scientist and user B is looking for a data scientist.
At first we required the user to hand select the tags they want to be matched on. And example of the kind of tags we provided:
Environmental Services
Events Services
Executive Office
Facilities Services
Human Resources
Information Services
Management Consulting
Outsourcing/Offshoring
Professional Training & Coaching
Security & Investigations
Staffing & Recruiting
Supermarkets
Wholesale
Energy & Mining
Mining & Metals
Oil & Energy
Utilities
Manufacturing
Automotive
Aviation & Aerospace
Chemicals
Defense & Space
Electrical & Electronic Manufacturing
Food Production
Industrial Automation
Machinery
Japanese Food
...
This system kinda works but we have a lot of tags and want to create more 'distant' relations.
So we need:
- to know which parts are important, we could use POS-tagging for this, to extract the 'data science', 'japanese food' etc?
- and then compare the vectors of each part; e.g. 'data science' with 'statistics' is a good match, and 'japanese food' and 'asian food' is a good match.
- and set a threshold.
- and this should result in a more convenient way of matching right?
To improve tag-based matching with a large set of tags, you can use part-of-speech tagging (POS tagging) to identify essential keywords within tags. These keywords, like "data science" or "Japanese food," serve as the focal points for matching. Convert these keywords into vector representations using techniques like Word2Vec or TF-IDF, which capture semantic meanings.
Next, compare the vectors of different tags to measure their similarity. Common similarity metrics like cosine similarity can quantify the relatedness of tags. Set a similarity threshold to determine which tags are considered relevant matches. Fine-tune this threshold to control the granularity of matches.
When users select tags, compare their chosen tags with others in your database. Present potential matches whose similarity scores exceed the threshold. Additionally, handle variations in tags using techniques like synonym mapping or stemming to ensure robust matching.
This approach allows for more nuanced and distant tag relations, resulting in a flexible and accurate matching system. While it may require computational resources, it greatly enhances the user experience by providing better tag-based recommendations.