mixedbread-ai / baguetter
Baguetter is a flexible, efficient, and hackable search engine library implemented in Python. It's designed for quickly benchmarking, implementing, and testing new search methods. Baguetter supports sparse (traditional), dense (semantic), and hybrid retrieval methods.
☆171Updated 5 months ago
Alternatives and similar repositories for baguetter:
Users that are interested in baguetter are comparing it to the libraries listed below
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…☆119Updated last month
- Late Interaction Models Training & Retrieval☆236Updated last week
- Simple replication of [ColBERT-v1](https://arxiv.org/abs/2004.12832).☆79Updated 10 months ago
- Python API for https://vespa.ai, the open big data serving engine☆113Updated this week
- Efficient vector database for hundred millions of embeddings.☆206Updated 8 months ago
- ☆207Updated 7 months ago
- minimal pytorch implementation of bm25 (with sparse tensors)☆97Updated 11 months ago
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆184Updated 4 months ago
- Generalist and Lightweight Model for Text Classification☆65Updated 3 weeks ago
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆239Updated this week
- code for training & evaluating Contextual Document Embedding models☆173Updated last month
- awesome synthetic (text) datasets☆259Updated 3 months ago
- Vector Database with support for late interaction and token level embeddings.☆52Updated 4 months ago
- Neural Search☆350Updated 8 months ago
- RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.☆402Updated this week
- ☆62Updated 6 months ago
- Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.☆173Updated this week
- Fast Semantic Text Deduplication☆511Updated 2 weeks ago
- 📝 Reference-Free automatic summarization evaluation with potential hallucination detection☆101Updated last year
- ☆147Updated 2 months ago
- A large-scale information-rich web dataset, featuring millions of real clicked query-document labels☆313Updated last month
- Notebooks for training universal 0-shot classifiers on many different tasks☆120Updated last month
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆406Updated last year
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker☆106Updated this week
- data cleaning and curation for unstructured text☆329Updated 6 months ago
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. 👨🏻🍳☆253Updated last month
- experiments with inference on llama☆104Updated 8 months ago
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆67Updated 3 months ago
- Check for data drift between two OpenAI multi-turn chat jsonl files.☆37Updated 10 months ago