minimal pytorch implementation of bm25 (with sparse tensors)
☆104Oct 28, 2025Updated 4 months ago
Alternatives and similar repositories for bm25_pt
Users that are interested in bm25_pt are comparing it to the libraries listed below
Sorting:
- ☆13Aug 23, 2024Updated last year
- ☆27Aug 1, 2024Updated last year
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy☆1,507Feb 17, 2026Updated 2 weeks ago
- Fast and differentiable hidden Markov model in C++☆19Jan 20, 2023Updated 3 years ago
- Late Interaction Models Training & Retrieval☆740Updated this week
- Measuring and Controlling Persona Drift in Language Model Dialogs☆22Feb 26, 2024Updated 2 years ago
- Using modal.com to process FineWeb-edu data☆20Apr 5, 2025Updated 11 months ago
- code for training & evaluating Contextual Document Embedding models☆201May 14, 2025Updated 9 months ago
- A stable, fast and easy-to-use inference library with a focus on a sync-to-async API☆48Sep 26, 2024Updated last year
- ☆24Feb 4, 2026Updated last month
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆224Dec 16, 2025Updated 2 months ago
- This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.☆20Jun 3, 2024Updated last year
- coded with and corrected by Google Anti-Gravity☆13Nov 23, 2025Updated 3 months ago
- High performance pytorch modules☆17Jan 14, 2023Updated 3 years ago
- Hugging Face RoBERTa with Flash Attention 2☆24Sep 14, 2025Updated 5 months ago
- Model implementation for the contextual embeddings project☆41Jun 2, 2025Updated 9 months ago
- utilities for loading and running text embeddings with onnx☆45Aug 16, 2025Updated 6 months ago
- A repository for research on medium sized language models.☆78May 23, 2024Updated last year
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆76Oct 19, 2024Updated last year
- High-performance tokenized language data-loader for Python C++ extension☆14Jul 22, 2024Updated last year
- Label shift estimation for transfer difficulty with Familiarity.☆10Feb 4, 2025Updated last year
- ☆13Nov 27, 2025Updated 3 months ago
- Frontend (and soon also midleware and backend) for a new, opensource image generation platform.☆14Nov 5, 2022Updated 3 years ago
- Library for fast text representation and classification.☆31Jan 9, 2024Updated 2 years ago
- GoldFinch and other hybrid transformer components☆45Jul 20, 2024Updated last year
- Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.☆2,033Updated this week
- Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and…☆45Oct 10, 2025Updated 4 months ago
- Fine-tunes a student LLM using teacher feedback for improved reasoning and answer quality. Implements GRPO with teacher-provided evaluati…☆51May 7, 2025Updated 10 months ago
- ☆57Jan 26, 2025Updated last year
- RWKV-7 mini☆12Mar 29, 2025Updated 11 months ago
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated last year
- ☆24Jan 30, 2025Updated last year
- Faster Learned Sparse Retrieval with Block-Max Pruning. ACM SIGIR 2024.☆35Jan 14, 2026Updated last month
- SPLADE: sparse neural search (SIGIR21, SIGIR22)☆980May 3, 2024Updated last year
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆33Aug 14, 2024Updated last year
- Code for NeurIPS 2023 paper "Non-autoregressive Machine Translation with Probabilistic Context-free Grammar".☆12Jan 4, 2024Updated 2 years ago
- Synthetic data derived by templating, few shot prompting, transformations on public domain corpora, and monte carlo tree search.☆33Oct 8, 2025Updated 5 months ago
- ☆15Nov 13, 2025Updated 3 months ago
- Latent Large Language Models☆19Aug 24, 2024Updated last year