xhluca / bm25s
Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy
☆1,139Updated 2 weeks ago
Alternatives and similar repositories for bm25s:
Users that are interested in bm25s are comparing it to the libraries listed below
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,400Updated last month
- Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and cro…☆788Updated 5 months ago
- Bringing BERT into modernity via both architecture changes and scaling☆1,342Updated last month
- Easily embed, cluster and semantically label text datasets☆530Updated last year
- Fast Semantic Text Deduplication & Filtering☆654Updated last week
- RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.☆439Updated 2 weeks ago
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.☆776Updated 3 months ago
- SPLADE: sparse neural search (SIGIR21, SIGIR22)☆839Updated last year
- Enforce the output format (JSON Schema, Regex etc) of a language model☆1,791Updated 2 months ago
- Generative Representational Instruction Tuning☆624Updated last month
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆2,671Updated last week
- Late Interaction Models Training & Retrieval☆288Updated 3 weeks ago
- The official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval☆1,200Updated 8 months ago
- Code for explaining and evaluating late chunking (chunked pooling)☆377Updated 4 months ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,482Updated this week
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆299Updated last month
- Automated Evaluation of RAG Systems☆582Updated last month
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆421Updated last year
- Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.☆1,813Updated last week
- Evaluate your LLM's response with Prometheus and GPT4 💯☆930Updated last week
- Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'☆1,500Updated 3 months ago
- ⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍☆546Updated 10 months ago
- HyDE: Precise Zero-Shot Dense Retrieval without Relevance Labels☆530Updated 4 months ago
- Efficient Retrieval Augmentation and Generation Framework☆1,531Updated 3 months ago
- MTEB: Massive Text Embedding Benchmark☆2,469Updated this week
- A Collection of BM25 Algorithms in Python☆1,156Updated 6 months ago
- Train Models Contrastively in Pytorch☆700Updated last month
- A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.☆1,788Updated 2 months ago
- Stanford NLP Python library for Representation Finetuning (ReFT)☆1,464Updated 2 months ago
- SGPT: GPT Sentence Embeddings for Semantic Search☆866Updated last year