google-research-datasets / swim-ir
SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 languages, generated using PaLM 2 and summarize-then-ask prompting.
☆48Updated last year
Alternatives and similar repositories for swim-ir:
Users that are interested in swim-ir are comparing it to the libraries listed below
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆19Updated 2 months ago
- ☆29Updated last year
- IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our foc…☆31Updated 10 months ago
- No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval☆29Updated 2 years ago
- ☆54Updated 2 years ago
- 🌏 Modular retrievers for zero-shot multilingual IR.