microsoft / MS-MARCO-Web-Search
A large-scale information-rich web dataset, featuring millions of real clicked query-document labels
☆308Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for MS-MARCO-Web-Search
- RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.☆349Updated last week
- ☆451Updated 3 weeks ago
- Generative Representational Instruction Tuning☆567Updated this week
- This is the repository for our paper "INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning"☆197Updated 6 months ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆236Updated 4 months ago
- Baguetter is a flexible, efficient, and hackable search engine library implemented in Python. It's designed for quickly benchmarking, imp…☆162Updated 2 months ago
- awesome synthetic (text) datasets☆242Updated 3 weeks ago
- Inquisitive Parrots for Search☆178Updated 8 months ago
- Is ChatGPT Good at Search? LLMs as Re-Ranking Agent [EMNLP 2023 Outstanding Paper Award]☆526Updated 8 months ago
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy☆899Updated last week
- A large-scale multilingual dataset for Information Retrieval. Thorough human-annotations across 18 diverse languages.☆168Updated 3 months ago
- Contriever: Unsupervised Dense Information Retrieval with Contrastive Learning☆685Updated last year
- Dense X Retrieval: What Retrieval Granularity Should We Use?☆132Updated 10 months ago
- Data and code for FreshLLMs (https://arxiv.org/abs/2310.03214)☆328Updated this week
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆438Updated 8 months ago
- Easily embed, cluster and semantically label text datasets☆462Updated 7 months ago
- What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets☆190Updated this week
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆183Updated last month
- AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark☆106Updated last month
- Tevatron - A flexible toolkit for neural retrieval research and development.☆524Updated last month
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆811Updated this week
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆122Updated 8 months ago
- HyDE: Precise Zero-Shot Dense Retrieval without Relevance Labels☆450Updated last year
- Benchmarking library for RAG☆122Updated this week
- Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"☆448Updated 8 months ago
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆115Updated last week
- Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard☆485Updated last week
- batched loras☆336Updated last year
- ☆204Updated 4 months ago
- SPLADE: sparse neural search (SIGIR21, SIGIR22)☆780Updated 6 months ago