MTEB
Benchmark website →Massive Text Embedding Benchmark evaluates embeddings across 8 tasks: classification, clustering, pair classification, reranking, retrieval, STS, summarization.
About this test
- What it measures
- Overall embedding quality across diverse NLP tasks.
- How it was administered
- 56 datasets across 8 task types; average score reported; uses NDCG@10 for retrieval, Spearman for STS, accuracy for classification.
Model rankings
Models ranked by score on this benchmark. Higher is better.
| Rank | Model | Provider | Score | Percentile | Tags |
|---|---|---|---|---|---|
| 1 | Voyage AI | 67.5 | p97 | Embedding, Proprietary | |
| 2 | Cohere | 66.3 | p95 | Embedding, Proprietary | |
| 3 |