xhluca / bm25s
Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy
β899Updated last week
Related projects β
Alternatives and complementary repositories for bm25s
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.β1,095Updated last week
- Evaluate your LLM's response with Prometheus and GPT4 π―β797Updated 2 months ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifiβ¦β1,634Updated this week
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsβ811Updated this week
- Easily embed, cluster and semantically label text datasetsβ462Updated 7 months ago
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β2,045Updated this week
- Generative Representational Instruction Tuningβ567Updated this week
- Automated Evaluation of RAG Systemsβ484Updated 2 weeks ago
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.β617Updated last week
- ReFT: Representation Finetuning for Language Modelsβ1,159Updated 2 weeks ago
- A large-scale information-rich web dataset, featuring millions of real clicked query-document labelsβ308Updated 5 months ago
- Code for explaining and evaluating late chunking (chunked pooling)β246Updated last month
- Distill a Small Static Model from any Sentence Transformerβ460Updated this week
- β451Updated 3 weeks ago
- The code used to train and run inference with the ColPali architecture.β1,132Updated this week
- Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and croβ¦β665Updated last month
- RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking.β349Updated last week
- SPLADE: sparse neural search (SIGIR21, SIGIR22)β780Updated 6 months ago
- Neural Searchβ344Updated 5 months ago
- Train and Infer Powerful Sentence Embeddings with AnglE | π₯ SOTA on STS and MTEB Leaderboardβ485Updated last week
- The official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrievalβ966Updated 2 months ago
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.β385Updated 9 months ago
- awesome synthetic (text) datasetsβ242Updated 3 weeks ago
- Framework for enhancing LLMs for RAG tasks using fine-tuning.β504Updated this week
- Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'β1,296Updated last month
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024β1,424Updated last week
- Retrieve, Read and LinK: Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget (ACL 2024)β332Updated last month
- DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. β π€π€β845Updated 3 months ago
- Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard aβ¦β798Updated 2 weeks ago