beir-cellar/beir

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/beir-cellar/beir)

beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.

☆2,241

Alternatives and similar repositories for beir

Users that are interested in beir are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

castorini / pyserini
View on GitHub
Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
☆2,100Updated this week
texttron / tevatron
View on GitHub
Tevatron - Unified Document Retrieval Toolkit across Scale, Language, and Modality. Demo in SIGIR 2023, SIGIR 2025.
☆742Jul 3, 2026Updated 2 weeks ago
facebookresearch / DPR
View on GitHub
Dense Passage Retriever - is a set of tools and models for open domain Q&A task.
☆1,867Apr 6, 2023Updated 3 years ago
allenai / ir_datasets
View on GitHub
Provides a common interface to many IR ranking datasets.
☆390May 28, 2026Updated last month
naver / splade
View on GitHub
SPLADE: sparse neural search (SIGIR21, SIGIR22)
☆999May 3, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
cvangysel / pytrec_eval
View on GitHub
pytrec_eval is an Information Retrieval evaluation tool for Python, based on the popular trec_eval.
☆349Oct 10, 2023Updated 2 years ago
stanford-futuredata / ColBERT
View on GitHub
ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
☆3,902Oct 14, 2025Updated 9 months ago
castorini / pygaggle
View on GitHub
a gaggle of deep neural architectures for text ranking and question answering, designed for Pyserini
☆354Dec 21, 2023Updated 2 years ago
UKPLab / gpl
View on GitHub
Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: …
☆343Jul 6, 2023Updated 3 years ago
zetaalphavector / InPars
View on GitHub
Inquisitive Parrots for Search
☆200Jun 5, 2025Updated last year
castorini / anserini
View on GitHub
Anserini is a Lucene toolkit for reproducible information retrieval research
☆1,150Updated this week
embeddings-benchmark / mteb
View on GitHub
MTEB: State-of-the-art evaluation of embeddings across languages and modalities
☆3,362Updated this week
ict-bigdatalab / awesome-pretrained-models-for-information-retrieval
View on GitHub
A curated list of awesome papers related to pre-trained models for information retrieval (a.k.a., pretraining for IR).
☆677Jan 7, 2024Updated 2 years ago
sebastian-hofstaetter / matchmaker
View on GitHub
Training & evaluation library for text-based neural re-ranking and dense retrieval models built with PyTorch
☆265Jan 27, 2023Updated 3 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
facebookresearch / contriever
View on GitHub
Contriever: Unsupervised Dense Information Retrieval with Contrastive Learning
☆779Apr 7, 2023Updated 3 years ago
facebookresearch / dpr-scale
View on GitHub
Scalable training for dense retrieval models.
☆298Jul 2, 2026Updated 2 weeks ago
castorini / docTTTTTquery
View on GitHub
docTTTTTquery document expansion model
☆377Mar 25, 2023Updated 3 years ago
thunlp / OpenMatch
View on GitHub
An Open-Source Package for Information Retrieval.
☆442Oct 7, 2022Updated 3 years ago
microsoft / ANCE
View on GitHub
A novel embedding training algorithm leveraging ANN search and achieved SOTA retrieval on Trec DL 2019 and OpenQA benchmarks
☆385Jan 6, 2026Updated 6 months ago
unicamp-dl / mMARCO
View on GitHub
A multilingual version of MS MARCO passage ranking dataset
☆148Oct 19, 2023Updated 2 years ago
castorini / mr.tydi
View on GitHub
Mr. TyDi is a multi-lingual benchmark dataset built on TyDi, covering eleven typologically diverse languages.
☆83Feb 16, 2022Updated 4 years ago
luyug / Condenser
View on GitHub
EMNLP 2021 - Pre-training architectures for dense retrieval
☆256Mar 18, 2022Updated 4 years ago
huggingface / sentence-transformers
View on GitHub
State-of-the-Art Embeddings, Retrieval, and Reranking
☆18,920Updated this week
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
sebastian-hofstaetter / teaching
View on GitHub
Open-Source Information Retrieval Courses @ TU Wien
☆705Jun 12, 2023Updated 3 years ago
facebookresearch / KILT
View on GitHub
Library for Knowledge Intensive Language Tasks
☆978Mar 31, 2022Updated 4 years ago
studio-ousia / bpr
View on GitHub
Binary Passage Retriever (BPR) - an efficient passage retriever for open-domain question answering
☆175Jun 6, 2021Updated 5 years ago
terrier-org / pyterrier
View on GitHub
A Python framework for performing information retrieval experiments, building on http://terrier.org/
☆508Updated this week
princeton-nlp / DensePhrases
View on GitHub
[ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval Learns Passage Retrieval, Too https://arxiv.o…
☆607Jun 15, 2022Updated 4 years ago
facebookresearch / SEAL
View on GitHub
Search Engines with Autoregressive Language models
☆296Apr 4, 2023Updated 3 years ago
lightonai / pylate
View on GitHub
Late Interaction Models Training & Retrieval
☆875Jul 13, 2026Updated last week
Muennighoff / sgpt
View on GitHub
SGPT: GPT Sentence Embeddings for Semantic Search
☆872Feb 17, 2024Updated 2 years ago
dorianbrown / rank_bm25
View on GitHub
A Collection of BM25 Algorithms in Python
☆1,362May 2, 2026Updated 2 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
luyug / Reranker
View on GitHub
Build Text Rerankers with Deep Language Models
☆265Feb 20, 2024Updated 2 years ago
castorini / dhr
View on GitHub
Dense hybrid representations for text retrieval
☆65Apr 3, 2023Updated 3 years ago
microsoft / MSMARCO-Passage-Ranking
View on GitHub
MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension, question answering, …
☆343Jun 12, 2023Updated 3 years ago
sebastian-hofstaetter / neural-ranking-kd
View on GitHub
Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation
☆117Jul 11, 2021Updated 5 years ago
luyug / COIL
View on GitHub
NAACL2021 - COIL Contextualized Lexical Retriever
☆158Jul 27, 2021Updated 4 years ago
thakur-nandan / sprint
View on GitHub
SPRINT Toolkit helps you evaluate diverse neural sparse models easily using a single click on any IR dataset.
☆48Jul 25, 2023Updated 2 years ago
project-miracl / miracl
View on GitHub
A large-scale multilingual dataset for Information Retrieval. Thorough human-annotations across 18 diverse languages.
☆211Jul 31, 2024Updated last year