A large-scale information-rich web dataset, featuring millions of real clicked query-document labels
☆346Dec 16, 2024Updated last year
Alternatives and similar repositories for MS-MARCO-Web-Search
Users that are interested in MS-MARCO-Web-Search are comparing it to the libraries listed below
Sorting:
- ☆16Dec 11, 2024Updated last year
- One-stop shop for running and fine-tuning transformer-based language models for retrieval☆63Feb 26, 2026Updated last week
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆22Jun 30, 2025Updated 8 months ago
- ☆161Apr 17, 2024Updated last year
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy☆1,500Feb 17, 2026Updated 2 weeks ago
- [ICLR 2025] BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval☆190Sep 13, 2025Updated 5 months ago
- RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.☆579Feb 24, 2026Updated last week
- provides a common interface to many IR measure tools☆96Feb 17, 2026Updated 2 weeks ago
- Late Interaction Models Training & Retrieval☆732Updated this week
- A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.☆2,095Oct 16, 2025Updated 4 months ago
- Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.☆2,026Updated this week
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆149Oct 27, 2024Updated last year
- [ACL 2025] AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark☆166Oct 14, 2025Updated 4 months ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆224Dec 16, 2025Updated 2 months ago
- Generative Representational Instruction Tuning☆687Jun 25, 2025Updated 8 months ago
- Training & evaluation library for text-based neural re-ranking and dense retrieval models built with PyTorch☆265Jan 27, 2023Updated 3 years ago
- ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)☆3,782Oct 14, 2025Updated 4 months ago
- Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents Integration☆15Jun 4, 2024Updated last year
- Provides a common interface to many IR ranking datasets.☆381Feb 20, 2026Updated last week
- Rank-DistiLLM: Closing the Effectiveness Gap Between Cross-Encoders and LLMs for Passage Re-Ranking☆25Apr 4, 2025Updated 11 months ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,599Dec 20, 2025Updated 2 months ago
- A large scale feature extraction tool for text-based machine learning☆32Sep 6, 2022Updated 3 years ago
- XTR: Rethinking the Role of Token Retrieval in Multi-Vector Retrieval☆61Jun 20, 2024Updated last year
- Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encoders☆18May 23, 2025Updated 9 months ago
- Baguetter is a flexible, efficient, and hackable search engine library implemented in Python. It's designed for quickly benchmarking, imp…☆206Aug 31, 2024Updated last year
- ☆85Nov 3, 2025Updated 4 months ago
- This is the repository for our paper "INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning"☆207Feb 18, 2026Updated 2 weeks ago
- Scalable training for dense retrieval models.☆298Jun 10, 2025Updated 8 months ago
- Convert all of libgen to high quality markdown☆255Dec 13, 2023Updated 2 years ago
- some common Huggingface transformers in maximal update parametrization (µP)☆87Mar 14, 2022Updated 3 years ago
- Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval☆52Jan 6, 2026Updated last month
- Official Implementation of "Multi-Head RAG: Solving Multi-Aspect Problems with LLMs"☆239Feb 26, 2026Updated last week
- A massively multilingual modern encoder language model☆131Jan 20, 2026Updated last month
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,915Updated this week
- ☆16Oct 20, 2025Updated 4 months ago
- ☆89Apr 3, 2025Updated 11 months ago
- Official software repository of S. Bruch, F. M. Nardini, C. Rulli, and R. Venturini. "Efficient Inverted Indexes for Approximate Retrieva…☆105Jan 27, 2026Updated last month
- Information Retrieval Relevance Judging System☆29Jan 17, 2022Updated 4 years ago
- Train Models Contrastively in Pytorch☆777Mar 26, 2025Updated 11 months ago