To implement fuzzy matching between items, I'd like to generate vector embedding from items with multiple attributes. Those attributes include:
- text fields with phrases
- a list of email addresses (a strong indicator of similarity)
- several numeric properties
Most examples only show how to create an embedding from an unstructured block of text. My understanding is two techniques would need to be applied:
- generate partial embeddings for each of the attributes
- combine all the embeddings
How can partial embeddings be generated for the non-text (numeric and email list) attributes? Are there any techniques or libraries for combining embeddings with weights? Or is this a wrong-headed approach to similarity matching for this kind of data?