Fast Multimodal Semantic Deduplication & Filtering
☆937May 24, 2026Updated last month
Alternatives and similar repositories for semhash
Users that are interested in semhash are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Lightweight Nearest Neighbors with Flexible Backends☆345May 24, 2026Updated last month
- Pre-train Static Word Embeddings☆106Jun 9, 2026Updated 3 weeks ago
- Fast State-of-the-Art Static Embeddings☆2,132Jun 6, 2026Updated 3 weeks ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆3,300Jun 22, 2026Updated last week
- Late Interaction Models Training & Retrieval☆859Updated this week
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Efficient few-shot learning with Sentence Transformers☆2,755May 26, 2026Updated last month
- High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datas…☆244Jun 17, 2026Updated last week
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts)☆3,341Jun 16, 2026Updated last week
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆3,138May 26, 2026Updated last month
- Fast BM25 search in Python, powered by Numpy and Numba☆1,715Jun 11, 2026Updated 2 weeks ago
- Generalist and Lightweight Model for Text Classification☆226Jun 15, 2026Updated 2 weeks ago
- Fast search index for SPLADE sparse retrieval models implemented in Python using Numpy and Numba☆38Oct 16, 2025Updated 8 months ago
- Structured Outputs☆14,273Updated this week
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,938May 17, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Build datasets using natural language☆579Sep 19, 2025Updated 9 months ago
- Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets☆5,014Jun 22, 2026Updated last week
- Bringing BERT into modernity via both architecture changes and scaling☆1,696Mar 1, 2026Updated 3 months ago
- Easily embed, cluster and semantically label text datasets☆609Mar 28, 2024Updated 2 years ago
- Fast, Accurate, Lightweight Python library to make State of the Art Embedding☆3,058Jun 23, 2026Updated last week
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆2,463Jun 23, 2026Updated last week
- Robust and fast topic models with sentence-transformers.☆115Jun 11, 2026Updated 2 weeks ago
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆220Sep 18, 2025Updated 9 months ago
- Curated list of datasets and tools for post-training.☆4,665Apr 29, 2026Updated 2 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Python library to use Pleias-RAG models☆72Jun 20, 2026Updated last week
- SpanMarker for Named Entity Recognition☆476Apr 10, 2026Updated 2 months ago
- ☆572Nov 20, 2024Updated last year
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,621Dec 20, 2025Updated 6 months ago
- Baguetter is a flexible, efficient, and hackable search engine library implemented in Python. It's designed for quickly benchmarking, imp…☆211Aug 31, 2024Updated last year
- ☆162Dec 2, 2024Updated last year
- Leveraging BERT and c-TF-IDF to create easily interpretable topics.☆7,716May 13, 2026Updated last month
- A flexible, adaptive classification system for dynamic text classification☆566Oct 7, 2025Updated 8 months ago
- Lightweight hallucination detection framework for RAG applications☆578Updated this week
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆89Feb 10, 2026Updated 4 months ago
- Simply, faster, sentence-transformers☆144Aug 27, 2024Updated last year
- Generalist and Lightweight Model for Relation Extraction (Extract any relationship types from text)☆285Mar 30, 2026Updated 2 months ago
- Active Learning for Text Classification in Python☆645May 24, 2026Updated last month
- Everything about the SmolLM and SmolVLM family of models☆3,826May 26, 2026Updated last month
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…☆161Jul 14, 2025Updated 11 months ago
- Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard a…☆2,127Dec 3, 2025Updated 6 months ago