microsoft/MS-MARCO-Web-Search

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/microsoft/MS-MARCO-Web-Search)

microsoft / MS-MARCO-Web-Search

A large-scale information-rich web dataset, featuring millions of real clicked query-document labels

☆351

Alternatives and similar repositories for MS-MARCO-Web-Search

Users that are interested in MS-MARCO-Web-Search are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ielab / Starbucks
View on GitHub
Starbucks: Improved Training for 2D Matryoshka Embeddings
☆25Jun 30, 2025Updated last year
lemurproject / ClueWeb22
View on GitHub
☆17Dec 11, 2024Updated last year
webis-de / lightning-ir
View on GitHub
One-stop shop for running and fine-tuning transformer-based language models for retrieval
☆65Jul 9, 2026Updated 2 weeks ago
ten-blue-links / fxt
View on GitHub
A large scale feature extraction tool for text-based machine learning
☆32Sep 6, 2022Updated 3 years ago
xlang-ai / BRIGHT
View on GitHub
[ICLR 2025] BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
☆207Sep 13, 2025Updated 10 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
webis-de / rank-distillm
View on GitHub
Rank-DistiLLM: Closing the Effectiveness Gap Between Cross-Encoders and LLMs for Passage Re-Ranking
☆25Apr 4, 2025Updated last year
castorini / rank_llm
View on GitHub
RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.
☆610Updated this week
sebastian-hofstaetter / matchmaker
View on GitHub
Training & evaluation library for text-based neural re-ranking and dense retrieval models built with PyTorch
☆265Jan 27, 2023Updated 3 years ago
terrierteam / ir_measures
View on GitHub
provides a common interface to many IR measure tools
☆102Feb 17, 2026Updated 5 months ago
ielab / PromptReps
View on GitHub
Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval
☆52Jan 6, 2026Updated 6 months ago
terrierteam / pyterrier_t5
View on GitHub
☆17Apr 30, 2026Updated 2 months ago
castorini / pyserini
View on GitHub
Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
☆2,102Jul 16, 2026Updated last week
thongnt99 / learned-sparse-retrieval
View on GitHub
Unified Learned Sparse Retrieval Framework
☆68May 13, 2024Updated 2 years ago
ielab / asyncval
View on GitHub
A toolkit for asynchronously validating dense retriever checkpoints during training.
☆27Aug 10, 2023Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
beir-cellar / beir
View on GitHub
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
☆2,252Oct 16, 2025Updated 9 months ago
lightonai / pylate
View on GitHub
Late Interaction Models Training & Retrieval
☆876Updated this week
tira-io / ir-experiment-platform
View on GitHub
☆31Sep 25, 2024Updated last year
NEUIR / ConAE
View on GitHub
[EMNLP 2022] This is the code repo for our EMNLP‘22 paper "Dimension Reduction for Efficient Dense Retrieval via Conditional Autoencoder"…
☆13Oct 20, 2022Updated 3 years ago
OpenMatch / OpenMatch
View on GitHub
An Open-Source Package for Information Retrieval
☆167Jul 13, 2026Updated last week
jakespringer / echo-embeddings
View on GitHub
☆168Apr 17, 2024Updated 2 years ago
irgroup / repro_eval
View on GitHub
A Python Interface to Reproducibility Measures of System-Oriented IR Experiments
☆11Dec 2, 2025Updated 7 months ago
google-deepmind / xtr
View on GitHub
XTR: Rethinking the Role of Token Retrieval in Multi-Vector Retrieval
☆64Jun 20, 2024Updated 2 years ago
stanford-futuredata / ColBERT
View on GitHub
ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
☆3,903Oct 14, 2025Updated 9 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
facebookresearch / dpr-scale
View on GitHub
Scalable training for dense retrieval models.
☆298Jul 2, 2026Updated 3 weeks ago
allenai / ir_datasets
View on GitHub
Provides a common interface to many IR ranking datasets.
☆390May 28, 2026Updated last month
xhluca / bm25s
View on GitHub
Fast BM25 search in Python, powered by Numpy and Numba
☆1,746Updated this week
mixedbread-ai / batched
View on GitHub
The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…
☆161Jul 14, 2025Updated last year
osirrc / ciff
View on GitHub
Common Index File Format to to support interoperability between open-source IR engines
☆40Sep 19, 2024Updated last year
AIR-Bench / AIR-Bench
View on GitHub
[ACL 2025] AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark
☆167Mar 29, 2026Updated 3 months ago
OpenBMB / RAG-DDR
View on GitHub
This is the code repo for the paper "RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards".
☆23Oct 28, 2024Updated last year
rankbiased / rbstar
View on GitHub
Rank-Biased Precision, Overlap, Recall, and Alignment
☆13Jun 15, 2026Updated last month
yuhongqian / ANCE-PRF
View on GitHub
☆12May 17, 2022Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
ContextualAI / gritlm
View on GitHub
Generative Representational Instruction Tuning
☆697Jun 25, 2025Updated last year
RulinShao / retrieval-scaling
View on GitHub
Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".
☆226Dec 16, 2025Updated 7 months ago
staoxiao / RetroMAE
View on GitHub
Codebase for RetroMAE and beyond.
☆275Jun 7, 2024Updated 2 years ago
fresh-stack / freshstack
View on GitHub
This repository helps you evaluate your models on the FreshStack benchmark!
☆34Dec 9, 2025Updated 7 months ago
texttron / BrowseComp-Plus
View on GitHub
BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent (ACL 2026 Main)
☆319May 28, 2026Updated last month
texttron / tevatron
View on GitHub
Tevatron - Unified Document Retrieval Toolkit across Scale, Language, and Modality. Demo in SIGIR 2023, SIGIR 2025.
☆743Jul 18, 2026Updated last week
orionw / promptriever
View on GitHub
The first dense retrieval model that can be prompted like an LM
☆93May 8, 2025Updated last year