unicamp-dl/mMARCO

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/unicamp-dl/mMARCO)

unicamp-dl / mMARCO

A multilingual version of MS MARCO passage ranking dataset

☆148

Alternatives and similar repositories for mMARCO

Users that are interested in mMARCO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

castorini / mr.tydi
View on GitHub
Mr. TyDi is a multi-lingual benchmark dataset built on TyDi, covering eleven typologically diverse languages.
☆83Feb 16, 2022Updated 4 years ago
zetaalphavector / InPars
View on GitHub
Inquisitive Parrots for Search
☆200Jun 5, 2025Updated last year
ielab / asyncval
View on GitHub
A toolkit for asynchronously validating dense retriever checkpoints during training.
☆27Aug 10, 2023Updated 2 years ago
unicamp-dl / ExaRanker
View on GitHub
☆29Feb 2, 2024Updated 2 years ago
hltcoe / ColBERT-X
View on GitHub
CLIR version of ColBERT
☆73May 28, 2026Updated 2 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
castorini / pygaggle
View on GitHub
a gaggle of deep neural architectures for text ranking and question answering, designed for Pyserini
☆354Dec 21, 2023Updated 2 years ago
hltcoe / patapsco
View on GitHub
Cross language information retrieval pipeline
☆19Jan 12, 2026Updated 6 months ago
beir-cellar / beir
View on GitHub
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
☆2,255Oct 16, 2025Updated 9 months ago
luyug / Condenser
View on GitHub
EMNLP 2021 - Pre-training architectures for dense retrieval
☆256Mar 18, 2022Updated 4 years ago
texttron / tevatron
View on GitHub
Tevatron - Unified Document Retrieval Toolkit across Scale, Language, and Modality. Demo in SIGIR 2023, SIGIR 2025.
☆743Jul 18, 2026Updated last week
project-miracl / miracl
View on GitHub
A large-scale multilingual dataset for Information Retrieval. Thorough human-annotations across 18 diverse languages.
☆211Jul 31, 2024Updated last year
castorini / anserini-notebooks
View on GitHub
Anserini notebooks
☆69Apr 2, 2023Updated 3 years ago
guilhermemr04 / scaling-zero-shot-retrieval
View on GitHub
No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval
☆29Sep 26, 2022Updated 3 years ago
castorini / pyserini
View on GitHub
Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
☆2,104Jul 16, 2026Updated last week
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
unicamp-dl / InRanker
View on GitHub
☆47Feb 7, 2024Updated 2 years ago
neuralmind-ai / visconde
View on GitHub
☆40May 13, 2023Updated 3 years ago
capreolus-ir / capreolus
View on GitHub
A toolkit for end-to-end neural ad hoc retrieval
☆98Aug 20, 2024Updated last year
ruanchaves / napolab
View on GitHub
The Natural Portuguese Language Benchmark (Napolab). Stay up to date with the latest advancements in Portuguese language models and their…
☆72Jul 28, 2025Updated last year
ant-louis / xm-retrievers
View on GitHub
🌏 Modular retrievers for zero-shot multilingual IR.
☆30Mar 6, 2024Updated 2 years ago
JetRunner / LaPraDoR
View on GitHub
🦮 Code and pretrained models for Findings of ACL 2022 paper "LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrie…
☆49Apr 25, 2022Updated 4 years ago
google-research-datasets / swim-ir
View on GitHub
SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…
☆50Nov 13, 2023Updated 2 years ago
sebastian-hofstaetter / neural-ranking-kd
View on GitHub
Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation
☆117Jul 11, 2021Updated 5 years ago
luyug / COIL
View on GitHub
NAACL2021 - COIL Contextualized Lexical Retriever
☆158Jul 27, 2021Updated 5 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
unicamp-dl / Lite-T5-Translation
View on GitHub
☆27Jan 23, 2024Updated 2 years ago
naver / splade
View on GitHub
SPLADE: sparse neural search (SIGIR21, SIGIR22)
☆999May 3, 2024Updated 2 years ago
ruanchaves / elmo
View on GitHub
Supporting code for the paper "Portuguese Language Models and Word Embeddings: Evaluating on Semantic Similarity Tasks".
☆11Dec 8, 2022Updated 3 years ago
cvangysel / pytrec_eval
View on GitHub
pytrec_eval is an Information Retrieval evaluation tool for Python, based on the popular trec_eval.
☆350Oct 10, 2023Updated 2 years ago
sebastianruder / emnlp2021-multiqa-tutorial
View on GitHub
EMNLP 2021 Tutorial: Multi-Domain Multilingual Question Answering
☆38Nov 7, 2021Updated 4 years ago
zetaalphavector / RAGElo
View on GitHub
RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker
☆129Jun 26, 2026Updated last month
UKPLab / gpl
View on GitHub
Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: …
☆342Jul 6, 2023Updated 3 years ago
allenai / ir_datasets
View on GitHub
Provides a common interface to many IR ranking datasets.
☆390May 28, 2026Updated 2 months ago
AkariAsai / CORA
View on GitHub
This is the official implementation of NeurIPS 2021 "One Question Answering Model for Many Languages with Cross-lingual Dense Passage Ret…
☆71Apr 1, 2022Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
facebookresearch / dpr-scale
View on GitHub
Scalable training for dense retrieval models.
☆298Jul 2, 2026Updated 3 weeks ago
unicamp-dl / PTT5
View on GitHub
Code for training and evaluating T5 on Portuguese data.
☆91Dec 8, 2022Updated 3 years ago
nreimers / flax-sentence-embeddings
View on GitHub
Shared code for training sentence embeddings with Flax / JAX
☆28Jul 15, 2021Updated 5 years ago
studio-ousia / bpr
View on GitHub
Binary Passage Retriever (BPR) - an efficient passage retriever for open-domain question answering
☆175Jun 6, 2021Updated 5 years ago
Georgetown-IR-Lab / covid-neural-ir
View on GitHub
☆24Oct 23, 2020Updated 5 years ago
sebastian-hofstaetter / teaching
View on GitHub
Open-Source Information Retrieval Courses @ TU Wien
☆706Jun 12, 2023Updated 3 years ago
lintool / robust04-analysis
View on GitHub
Meta-Analysis of Robust04 Papers (Yang et al., SIGIR 2019)
☆12May 25, 2019Updated 7 years ago