SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 languages, generated using PaLM 2 and summarize-then-ask prompting.
☆49Nov 13, 2023Updated 2 years ago
Alternatives and similar repositories for swim-ir
Users that are interested in swim-ir are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Fast search index for SPLADE sparse retrieval models implemented in Python using Numpy and Numba☆37Oct 16, 2025Updated 5 months ago
- This repository helps you evaluate your models on the FreshStack benchmark!☆34Dec 9, 2025Updated 3 months ago
- The Python Implementation of CRISP: Clustering Multi-Vector Representations for Denoising and Pruning☆27Jul 27, 2025Updated 8 months ago
- SPRINT Toolkit helps you evaluate diverse neural sparse models easily using a single click on any IR dataset.☆47Jul 25, 2023Updated 2 years ago
- RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.☆586Updated this week
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- [ACL 2025] AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark☆165Updated this week
- Codebase of ACL2024 paper "Spiral of Silence: How is Large Language Model Killing Information Retrieval?—A Case Study on Open Domain Ques…☆16Jun 4, 2024Updated last year
- 🌏 Modular retrievers for zero-shot multilingual IR.☆30Mar 6, 2024Updated 2 years ago
- A large-scale multilingual dataset for Information Retrieval. Thorough human-annotations across 18 diverse languages.☆202Jul 31, 2024Updated last year
- Effective Unsupervised Domain Adaptation of Neural Rankers by Diversifying Synthetic Query Generation☆15Apr 23, 2025Updated 11 months ago
- Use contrastive learning to train a large language model (LLM) as a retriever☆12Jul 19, 2024Updated last year
- Tevatron - Unified Document Retrieval Toolkit across Scale, Language, and Modality. Demo in SIGIR 2023, SIGIR 2025.☆733Jan 26, 2026Updated 2 months ago
- Scalable training for dense retrieval models.☆298Jun 10, 2025Updated 9 months ago
- An easy-to-use python toolkit for flexibly adapting various neural ranking models to target domain.☆60May 17, 2023Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Mr. TyDi is a multi-lingual benchmark dataset built on TyDi, covering eleven typologically diverse languages.☆80Feb 16, 2022Updated 4 years ago
- Retrieval-Augmented Generation battle!☆64Updated this week
- Author implementation of the paper "Don’t paraphrase, detect! Rapid and Effective Data Collection for Semantic Parsing"☆20Oct 5, 2020Updated 5 years ago
- ☆14Oct 28, 2023Updated 2 years ago
- Code that drives the public web-based tools for the Media Cloud Online News Archive and Directory.☆11Updated this week
- Question and answer retrieval in Turkish with BERT☆14Nov 30, 2021Updated 4 years ago
- Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: …☆341Jul 6, 2023Updated 2 years ago
- Jina VDR is a multilingual, multi-domain benchmark for visual document retrieval☆38Aug 4, 2025Updated 7 months ago
- SPLADE: sparse neural search (SIGIR21, SIGIR22)☆984May 3, 2024Updated last year
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Repository for Skill Set Optimization☆14Jul 26, 2024Updated last year
- ☆54Updated this week
- Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification☆11Aug 12, 2023Updated 2 years ago
- An Image/Text Retrieval Test Collection to Support Multimedia Content Creation☆21Oct 21, 2023Updated 2 years ago
- ☆14Mar 31, 2024Updated last year
- Accompanying repository of our AAAI-20 paper "Fine-Grained Argument Unit Recognition and Classification."☆21Jul 27, 2020Updated 5 years ago
- ML Project control panel☆10Sep 30, 2022Updated 3 years ago
- Benchmarking library for RAG☆263Mar 11, 2026Updated 2 weeks ago
- Generating Summaries with Controllable Readability Levels (EMNLP 2023)☆15Aug 6, 2025Updated 7 months ago
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- ☆10Feb 9, 2024Updated 2 years ago
- ☆13Jan 22, 2025Updated last year
- An unofficial implementation of Tensor4D with support for the D-NeRF dataset☆13Nov 8, 2023Updated 2 years ago
- Evaluation tools shared across anserini, pyserini, and pygaggle☆35Mar 19, 2026Updated last week
- A Scholarly Knowledge Graph Benchmark Dataset☆22Jan 12, 2026Updated 2 months ago
- Model implementation for the contextual embeddings project☆47Jun 2, 2025Updated 9 months ago
- T-Projection is a method to perform high-quality Annotation Projection of Sequence Labeling datasets.☆13Nov 21, 2023Updated 2 years ago