I'm comparing two models for use in a prototype project: sentence-t5 base and LEALLA. There are a few things that I'm interested in:
- Speed. Which one is faster? Even the smallest LEALLA model is larger than the S-T5 base model.
- Talents. It appears, based on the TFHub description, that LEALLA was built to be small and language-agnostic. Is it still good at general-purpose English encodings, compared to S-T5?
- Embedding type. I'm embedding HTML job descriptions. Will this affect the embedding quality disproportionately in one model vs. the other? How will embedding HTML impact embedding models in general?
- Overall performance. I saw that S-T5 was evaluated on the SentEval benchmark (which I know nothing about), but LEALLA used a completely different set of benchmarks focused on multi-language sentences. Could someone run SentEval on LEALLA too and interpret the results? If that's not possible, what would be the code if I wanted to do it myself?
- Fine-tuning. Are both of the models fine-tunable? What code could I write to fine-tune such models?