Comparing two Google text embedding models

81 views Asked by At

I'm comparing two models for use in a prototype project: sentence-t5 base and LEALLA. There are a few things that I'm interested in:

  1. Speed. Which one is faster? Even the smallest LEALLA model is larger than the S-T5 base model.
  2. Talents. It appears, based on the TFHub description, that LEALLA was built to be small and language-agnostic. Is it still good at general-purpose English encodings, compared to S-T5?
  3. Embedding type. I'm embedding HTML job descriptions. Will this affect the embedding quality disproportionately in one model vs. the other? How will embedding HTML impact embedding models in general?
  4. Overall performance. I saw that S-T5 was evaluated on the SentEval benchmark (which I know nothing about), but LEALLA used a completely different set of benchmarks focused on multi-language sentences. Could someone run SentEval on LEALLA too and interpret the results? If that's not possible, what would be the code if I wanted to do it myself?
  5. Fine-tuning. Are both of the models fine-tunable? What code could I write to fine-tune such models?
0

There are 0 answers