xhluca/bm25s

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/xhluca/bm25s)

xhluca / bm25s

Fast BM25 search in Python, powered by Numpy and Numba

☆1,746

Alternatives and similar repositories for bm25s

Users that are interested in bm25s are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

lightonai / pylate
View on GitHub
Late Interaction Models Training & Retrieval
☆876Updated this week
AnswerDotAI / rerankers
View on GitHub
A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.
☆1,626Dec 20, 2025Updated 7 months ago
mixedbread-ai / baguetter
View on GitHub
Baguetter is a flexible, efficient, and hackable search engine library implemented in Python. It's designed for quickly benchmarking, imp…
☆211Aug 31, 2024Updated last year
AnswerDotAI / RAGatouille
View on GitHub
Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…
☆3,943May 17, 2025Updated last year
castorini / pyserini
View on GitHub
Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
☆2,102Jul 16, 2026Updated last week
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
dorianbrown / rank_bm25
View on GitHub
A Collection of BM25 Algorithms in Python
☆1,366May 2, 2026Updated 2 months ago
stanford-futuredata / ColBERT
View on GitHub
ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
☆3,903Oct 14, 2025Updated 9 months ago
MinishLab / model2vec
View on GitHub
Fast State-of-the-Art Static Embeddings
☆2,166Jun 6, 2026Updated last month
dottxt-ai / outlines
View on GitHub
Structured Outputs
☆15,331Updated this week
qdrant / fastembed
View on GitHub
Fast, Accurate, Lightweight Python library to make State of the Art Embedding
☆3,104Updated this week
mixedbread-ai / batched
View on GitHub
The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…
☆161Jul 14, 2025Updated last year
michaelfeil / infinity
View on GitHub
Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali
☆2,892Mar 24, 2026Updated 4 months ago
beir-cellar / beir
View on GitHub
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
☆2,252Oct 16, 2025Updated 9 months ago
jxmorris12 / bm25_pt
View on GitHub
minimal pytorch implementation of bm25 (with sparse tensors)
☆105Oct 28, 2025Updated 8 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
illuin-tech / colpali
View on GitHub
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
☆2,707Jul 13, 2026Updated last week
urchade / GLiNER
View on GitHub
Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts)
☆3,428Updated this week
MinishLab / semhash
View on GitHub
Fast Multimodal Semantic Deduplication & Filtering
☆953May 24, 2026Updated 2 months ago
AnswerDotAI / byaldi
View on GitHub
Use late-interaction multi-modal models such as ColPali in just a few lines of code.
☆851Jan 28, 2025Updated last year
AmenRa / ranx
View on GitHub
⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍
☆689Aug 7, 2025Updated 11 months ago
naver / splade
View on GitHub
SPLADE: sparse neural search (SIGIR21, SIGIR22)
☆999May 3, 2024Updated 2 years ago
AnswerDotAI / ModernBERT
View on GitHub
Bringing BERT into modernity via both architecture changes and scaling
☆1,704Mar 1, 2026Updated 4 months ago
stanfordnlp / dspy
View on GitHub
DSPy: The framework for programming—not prompting—language models
☆36,371Updated this week
huggingface / text-embeddings-inference
View on GitHub
A blazing fast inference solution for text embeddings models
☆4,959Updated this week
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
FlagOpen / FlagEmbedding
View on GitHub
Retrieval and Retrieval-augmented LLMs
☆11,981Apr 22, 2026Updated 3 months ago
huggingface / setfit
View on GitHub
Efficient few-shot learning with Sentence Transformers
☆2,777May 26, 2026Updated 2 months ago
huggingface / datatrove
View on GitHub
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
☆3,223Updated this week
567-labs / instructor
View on GitHub
structured outputs for llms
☆13,616Jul 13, 2026Updated last week
texttron / tevatron
View on GitHub
Tevatron - Unified Document Retrieval Toolkit across Scale, Language, and Modality. Demo in SIGIR 2023, SIGIR 2025.
☆743Jul 18, 2026Updated last week
IBM / fastfit
View on GitHub
FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes
☆220Sep 18, 2025Updated 10 months ago
argilla-io / distilabel
View on GitHub
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…
☆3,344Updated this week
embeddings-benchmark / mteb
View on GitHub
MTEB: State-of-the-art evaluation of embeddings across languages and modalities
☆3,369Updated this week
PrithivirajDamodaran / FlashRank
View on GitHub
Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and cro…
☆995Jul 11, 2026Updated 2 weeks ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
argilla-io / argilla
View on GitHub
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
☆5,051Updated this week
illuin-tech / contextual-embeddings
View on GitHub
Model implementation for the contextual embeddings project
☆47Jun 2, 2025Updated last year
allenai / ir_datasets
View on GitHub
Provides a common interface to many IR ranking datasets.
☆390May 28, 2026Updated last month
lightonai / fast-plaid
View on GitHub
High-Performance Engine for Multi-Vector Search
☆271May 28, 2026Updated last month
softwaredoug / searcharray
View on GitHub
Full text search that feels like a numpy array
☆311May 4, 2026Updated 2 months ago
huggingface / sentence-transformers
View on GitHub
State-of-the-Art Embeddings, Retrieval, and Reranking
☆18,944Updated this week
castorini / rank_llm
View on GitHub
RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.
☆610Updated this week