SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 languages, generated using PaLM 2 and summarize-then-ask prompting.
☆49Nov 13, 2023Updated 2 years ago
Alternatives and similar repositories for swim-ir
Users that are interested in swim-ir are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Fast search index for SPLADE sparse retrieval models implemented in Python using Numpy and Numba☆38Oct 16, 2025Updated 8 months ago
- The Python Implementation of CRISP: Clustering Multi-Vector Representations for Denoising and Pruning☆27Jul 27, 2025Updated 10 months ago
- SPRINT Toolkit helps you evaluate diverse neural sparse models easily using a single click on any IR dataset.☆48Jul 25, 2023Updated 2 years ago
- RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.☆608Jun 19, 2026Updated last week
- [ACL 2025] AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark☆165Mar 29, 2026Updated 2 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- 🌏 Modular retrievers for zero-shot multilingual IR.☆30Mar 6, 2024Updated 2 years ago
- Effective Unsupervised Domain Adaptation of Neural Rankers by Diversifying Synthetic Query Generation☆16Apr 23, 2025Updated last year
- Use contrastive learning to train a large language model (LLM) as a retriever☆12Jul 19, 2024Updated last year
- provides a common interface to many IR measure tools☆101Feb 17, 2026Updated 4 months ago
- ☆24Apr 29, 2026Updated last month
- Data and code for paper "ODSum: New Benchmarks for Open Domain Multi-Document Summarization"☆11Sep 20, 2024Updated last year
- Tevatron - Unified Document Retrieval Toolkit across Scale, Language, and Modality. Demo in SIGIR 2023, SIGIR 2025.☆742May 18, 2026Updated last month
- A multilingual version of MS MARCO passage ranking dataset☆148Oct 19, 2023Updated 2 years ago
- Scalable training for dense retrieval models.☆299May 18, 2026Updated last month
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- An easy-to-use python toolkit for flexibly adapting various neural ranking models to target domain.☆60May 17, 2023Updated 3 years ago
- Mr. TyDi is a multi-lingual benchmark dataset built on TyDi, covering eleven typologically diverse languages.☆83Feb 16, 2022Updated 4 years ago
- Author implementation of the paper "Don’t paraphrase, detect! Rapid and Effective Data Collection for Semantic Parsing"☆20Oct 5, 2020Updated 5 years ago
- Retrieval-Augmented Generation battle!☆67Apr 18, 2026Updated 2 months ago
- ☆14Oct 28, 2023Updated 2 years ago
- A package for generating synthetic data and fine-tuning a gliner model.☆14Jun 5, 2024Updated 2 years ago
- This is the code repo for our paper "Learning More Effective Representations for Dense Retrieval through Deliberate Thinking Before Searc…☆27Mar 2, 2025Updated last year
- AI model designed to test the effectiveness in handling external ethical attacks.☆11Feb 9, 2026Updated 4 months ago
- Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: …☆342Jul 6, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Jina VDR is a multilingual, multi-domain benchmark for visual document retrieval☆38Aug 4, 2025Updated 10 months ago
- Rank-Biased Precision, Overlap, Recall, and Alignment☆12Jun 15, 2026Updated last week
- ☆52Nov 18, 2025Updated 7 months ago
- SPLADE: sparse neural search (SIGIR21, SIGIR22)☆995May 3, 2024Updated 2 years ago
- Repository for Skill Set Optimization☆14Jul 26, 2024Updated last year
- ☆57Apr 18, 2026Updated 2 months ago
- An Image/Text Retrieval Test Collection to Support Multimedia Content Creation☆21Oct 21, 2023Updated 2 years ago
- ☆14Mar 31, 2024Updated 2 years ago
- ☆11Mar 19, 2023Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Accompanying repository of our AAAI-20 paper "Fine-Grained Argument Unit Recognition and Classification."☆21Jul 27, 2020Updated 5 years ago
- SKT A.X LLM K1☆30Feb 11, 2026Updated 4 months ago
- ☆20Mar 30, 2024Updated 2 years ago
- Allows for automatic dispatching of bound functions when properties are changed.☆11Oct 24, 2022Updated 3 years ago
- MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale☆14Mar 22, 2021Updated 5 years ago
- Generating Summaries with Controllable Readability Levels (EMNLP 2023)☆15Apr 8, 2026Updated 2 months ago
- Benchmarking library for RAG☆274Mar 11, 2026Updated 3 months ago