project-miracl/miracl

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/project-miracl/miracl)

project-miracl / miracl

A large-scale multilingual dataset for Information Retrieval. Thorough human-annotations across 18 diverse languages.

☆211

Alternatives and similar repositories for miracl

Users that are interested in miracl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

castorini / mr.tydi
View on GitHub
Mr. TyDi is a multi-lingual benchmark dataset built on TyDi, covering eleven typologically diverse languages.
☆83Feb 16, 2022Updated 4 years ago
naver / splade
View on GitHub
SPLADE: sparse neural search (SIGIR21, SIGIR22)
☆999May 3, 2024Updated 2 years ago
unicamp-dl / mMARCO
View on GitHub
A multilingual version of MS MARCO passage ranking dataset
☆148Oct 19, 2023Updated 2 years ago
ssun32 / CLIRMatrix
View on GitHub
☆18Jul 23, 2021Updated 5 years ago
google-research-datasets / swim-ir
View on GitHub
SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…
☆50Nov 13, 2023Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
texttron / tevatron
View on GitHub
Tevatron - Unified Document Retrieval Toolkit across Scale, Language, and Modality. Demo in SIGIR 2023, SIGIR 2025.
☆743Jul 18, 2026Updated last week
hltcoe / patapsco
View on GitHub
Cross language information retrieval pipeline
☆19Jan 12, 2026Updated 6 months ago
capreolus-ir / capreolus
View on GitHub
A toolkit for end-to-end neural ad hoc retrieval
☆98Aug 20, 2024Updated last year
allenai / ir_datasets
View on GitHub
Provides a common interface to many IR ranking datasets.
☆390May 28, 2026Updated 2 months ago
embeddings-benchmark / mteb
View on GitHub
MTEB: State-of-the-art evaluation of embeddings across languages and modalities
☆3,372Updated this week
beir-cellar / beir
View on GitHub
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
☆2,255Oct 16, 2025Updated 9 months ago
hltcoe / ColBERT-X
View on GitHub
CLIR version of ColBERT
☆73May 28, 2026Updated 2 months ago
castorini / pygaggle
View on GitHub
a gaggle of deep neural architectures for text ranking and question answering, designed for Pyserini
☆354Dec 21, 2023Updated 2 years ago
ielab / asyncval
View on GitHub
A toolkit for asynchronously validating dense retriever checkpoints during training.
☆27Aug 10, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
castorini / pyserini
View on GitHub
Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
☆2,104Jul 16, 2026Updated last week
thongnt99 / learned-sparse-retrieval
View on GitHub
Unified Learned Sparse Retrieval Framework
☆68May 13, 2024Updated 2 years ago
google-research / t5x_retrieval
View on GitHub
☆102Dec 17, 2022Updated 3 years ago
castorini / dhr
View on GitHub
Dense hybrid representations for text retrieval
☆65Apr 3, 2023Updated 3 years ago
nickvosk / sigir2020-query-resolution
View on GitHub
☆13Jul 25, 2024Updated 2 years ago
castorini / anserini-tools
View on GitHub
Evaluation tools shared across anserini, pyserini, and pygaggle
☆36Jul 14, 2026Updated 2 weeks ago
luyug / Dense
View on GitHub
A toolkit for building dense retrievers with deep language models.
☆63Sep 24, 2021Updated 4 years ago
Georgetown-IR-Lab / covid-neural-ir
View on GitHub
☆24Oct 23, 2020Updated 5 years ago
naver / bergen
View on GitHub
Benchmarking library for RAG
☆276Jul 14, 2026Updated 2 weeks ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
castorini / hf-spacerini
View on GitHub
Plug-and-play Search Interfaces with Pyserini and Hugging Face
☆31Aug 5, 2023Updated 2 years ago
zetaalphavector / InPars
View on GitHub
Inquisitive Parrots for Search
☆200Jun 5, 2025Updated last year
facebookresearch / dpr-scale
View on GitHub
Scalable training for dense retrieval models.
☆298Jul 2, 2026Updated 3 weeks ago
luyug / COIL
View on GitHub
NAACL2021 - COIL Contextualized Lexical Retriever
☆158Jul 27, 2021Updated 5 years ago
microsoft / MSMARCO-Passage-Ranking-Submissions
View on GitHub
Submission archive for the MS MARCO passage ranking leaderboard
☆13Apr 21, 2023Updated 3 years ago
andrewyates / profane
View on GitHub
A library for creating complex experimental pipelines
☆12Jul 25, 2022Updated 4 years ago
castorini / TREC-COVID
View on GitHub
TREC-COVID results - this is a mirror of data on the TREC website in a more convenient format.
☆15Aug 31, 2020Updated 5 years ago
sebastian-hofstaetter / matchmaker
View on GitHub
Training & evaluation library for text-based neural re-ranking and dense retrieval models built with PyTorch
☆265Jan 27, 2023Updated 3 years ago
thakur-nandan / income
View on GitHub
INCOME: An Easy Repository for Training and Evaluation of Index Compression Methods in Dense Retrieval. Includes BPR and JPQ.
☆24Sep 24, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
thunlp / ConvDR
View on GitHub
Code repo for SIGIR 2021 paper "Few-Shot Conversational Dense Retrieval"
☆43Dec 9, 2021Updated 4 years ago
staoxiao / RetroMAE
View on GitHub
Codebase for RetroMAE and beyond.
☆275Jun 7, 2024Updated 2 years ago
princeton-nlp / EntityQuestions
View on GitHub
EMNLP'2021: Simple Entity-centric Questions Challenge Dense Retrievers https://arxiv.org/abs/2109.08535
☆148Feb 21, 2022Updated 4 years ago
UKPLab / gpl
View on GitHub
Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: …
☆342Jul 6, 2023Updated 3 years ago
AkariAsai / CORA
View on GitHub
This is the official implementation of NeurIPS 2021 "One Question Answering Model for Many Languages with Cross-lingual Dense Passage Ret…
☆71Apr 1, 2022Updated 4 years ago
facebookresearch / SEAL
View on GitHub
Search Engines with Autoregressive Language models
☆296Apr 4, 2023Updated 3 years ago
luyug / GC-DPR
View on GitHub
Train Dense Passage Retriever (DPR) with a single GPU
☆136Jun 16, 2021Updated 5 years ago