instructkr/bb25

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/instructkr/bb25)

instructkr / bb25

bb25 is a fast, self-contained BM25 + Bayesian calibration implementation with a minimal Python API.

☆148

Alternatives and similar repositories for bb25

Users that are interested in bb25 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

cognica-io / bayesian-bm25
View on GitHub
Bayesian probability transforms for BM25 retrieval scores
☆77Jun 20, 2026Updated last month
instructkr / reranker-simple-benchmark
View on GitHub
Make running benchmark simple yet maintainable, again. Now only supports Korean-based cross-encoder.
☆35Dec 2, 2025Updated 7 months ago
instructkr / base-cacheable-class
View on GitHub
A flexible base class for adding caching capabilities to your Python classes.
☆17Jul 9, 2025Updated last year
Zerohertz / Instruct_KR_2025_Summer_Meetup_vLLM
View on GitHub
🎹 Instruct.KR 2025 Summer Meetup: 오픈소스 LLM, vLLM으로 Production까지 🎹
☆23Aug 2, 2025Updated 11 months ago
stephantul / pynife
View on GitHub
Nearly Inference Free Embeddings: make your RAG queries 500x faster
☆80Apr 27, 2026Updated 2 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
lightonai / pylate-rs
View on GitHub
PyLate efficient inference engine
☆87Jan 7, 2026Updated 6 months ago
roipony / flash-maxsim
View on GitHub
☆27Jun 11, 2026Updated last month
TusKANNy / awesome-multivector-retrieval
View on GitHub
An extensive and commented list of resources on Late-Interaction Multivector Retrieval.
☆69Updated this week
ejaasaari / lemur
View on GitHub
[ICML'26] LEMUR reduces multi-vector retrieval for late interaction models such as ColBERT into regular single-vector retrieval.
☆31Jun 21, 2026Updated last month
hseb-benchmark / hseb
View on GitHub
HSEB: Hybrid Search Engine Benchmark
☆21Oct 5, 2025Updated 9 months ago
vespaai-playground / vespaembed
View on GitHub
No code tool for finetuning embedding models
☆30Updated this week
instructkr / rvllm-serverless
View on GitHub
rvLLM for runpod serverless environment — lightweight, instant startup vLLM replacement
☆40Apr 1, 2026Updated 3 months ago
realsigridjin / agentjson
View on GitHub
The parser that repairs broken JSON output for AI Agent Pipelines
☆116Dec 14, 2025Updated 7 months ago
lightonai / fast-plaid
View on GitHub
High-Performance Engine for Multi-Vector Search
☆271May 28, 2026Updated last month
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
DunZhang / Jasper-Token-Compression-Training
View on GitHub
The training codes of Jasper-Token-Compression-600M
☆20Nov 19, 2025Updated 8 months ago
feyninc / tokie
View on GitHub
🍡 30x faster tokenization for every HuggingFace model
☆49Updated this week
illuin-tech / contextual-embeddings
View on GitHub
Model implementation for the contextual embeddings project
☆47Jun 2, 2025Updated last year
kyopark2014 / llm-agent
View on GitHub
It shows how to deploy and use an agent with LLM.
☆19Mar 1, 2025Updated last year
Marker-Inc-Korea / AutoRAG-example-korean-embedding-benchmark
View on GitHub
AutoRAG example about benchmarking Korean embeddings.
☆46Oct 2, 2024Updated last year
mjeensung / xtr-pytorch
View on GitHub
☆19May 16, 2024Updated 2 years ago
ssisOneTeam / Korean-Embedding-Model-Performance-Benchmark-for-Retriever
View on GitHub
Korean Sentence Embedding Model Performance Benchmark for RAG
☆49Jan 27, 2025Updated last year
rasyosef / splade-index
View on GitHub
Fast search index for SPLADE sparse retrieval models implemented in Python using Numpy and Numba
☆38Oct 16, 2025Updated 9 months ago
DSBA-Lab / Contrastive-Accumulation
View on GitHub
☆14Jul 7, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
ls3-lab / QueryGym
View on GitHub
A lightweight, reproducible toolkit for LLM-based query reformulation.
☆41Updated this week
VectorSpaceLab / agentic-search
View on GitHub
Advancing search on top of AI agents
☆31Jun 9, 2026Updated last month
stephantul / skeletoken
View on GitHub
Datamodels for hugging face tokenizers
☆109Jun 18, 2026Updated last month
lightonai / ducksearch
View on GitHub
Efficient BM25 with DuckDB 🦆
☆68Dec 20, 2024Updated last year
Ouro-labs / ourocode
View on GitHub
ouroboros native cli with seamless mcp orchestration
☆17Jun 14, 2026Updated last month
Lumen-Labs / cpp-chunker
View on GitHub
Implementation of a fast semantic chunker in C++, installable in python 3.7+ projects.
☆22Sep 20, 2025Updated 10 months ago
AnswerDotAI / fastkmeans
View on GitHub
☆102Jul 4, 2025Updated last year
lightonai / fastkmeans-rs
View on GitHub
A Rust rewrite of FastKMeans for CPU-based clustering
☆17Jun 29, 2026Updated 3 weeks ago
lightonai / next-plaid
View on GitHub
NextPlaid, ColGREP: Multi-vector search, from database to coding agents.
☆520Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
TusKANNy / tachiom
View on GitHub
Official repository of TACHIOM.
☆63Jul 17, 2026Updated last week
Pringled / pyversity
View on GitHub
Fast Diversification for Search & Retrieval
☆493May 24, 2026Updated 2 months ago
daekeun-ml / evaluate-llm-on-korean-dataset
View on GitHub
Performs benchmarking on two Korean datasets with minimal time and effort.
☆45Jan 22, 2026Updated 6 months ago
DeployQL / awesome-multi-vector
View on GitHub
A list of multi-vector retrieval resources
☆19May 29, 2024Updated 2 years ago
Alibaba-NLP / E2Rank
View on GitHub
E2Rank: Your Text Embedding can Also be an Effective and Efficient Listwise Reranker
☆58Jul 1, 2026Updated 3 weeks ago
recombee / CompresSAE
View on GitHub
Sparse Embedding Compression for Scalable Retrieval in Recommender Systems
☆39Nov 21, 2025Updated 8 months ago
jina-ai / embedding-inversion-demo
View on GitHub
Embedding Inversion via Conditional Masked Diffusion: recover original text from embedding vectors using parallel denoising. Live demo + …
☆60Mar 7, 2026Updated 4 months ago