Fast Multimodal Semantic Deduplication & Filtering
☆897Jan 20, 2026Updated 2 months ago
Alternatives and similar repositories for semhash
Users that are interested in semhash are comparing it to the libraries listed below
Sorting:
- Lightweight Nearest Neighbors with Flexible Backends☆335Dec 30, 2025Updated 2 months ago
- Pre-train Static Word Embeddings☆95Sep 9, 2025Updated 6 months ago
- Fast State-of-the-Art Static Embeddings☆2,011Mar 12, 2026Updated last week
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆3,121Mar 9, 2026Updated last week
- Late Interaction Models Training & Retrieval☆743Mar 6, 2026Updated 2 weeks ago
- Efficient few-shot learning with Sentence Transformers☆2,697Dec 11, 2025Updated 3 months ago
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024☆2,915Updated this week
- High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datas…☆237Mar 10, 2026Updated last week
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,956Updated this week
- Fast lexical search implementing BM25 in Python☆1,589Updated this week
- Fast search index for SPLADE sparse retrieval models implemented in Python using Numpy and Numba☆37Oct 16, 2025Updated 5 months ago
- Build datasets using natural language☆570Sep 19, 2025Updated 6 months ago
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,882May 17, 2025Updated 10 months ago
- Easily embed, cluster and semantically label text datasets☆600Mar 28, 2024Updated last year
- Structured Outputs☆13,564Mar 9, 2026Updated last week
- Generalist and Lightweight Model for Text Classification☆198Feb 17, 2026Updated last month
- Bringing BERT into modernity via both architecture changes and scaling☆1,642Mar 1, 2026Updated 2 weeks ago
- Fast, Accurate, Lightweight Python library to make State of the Art Embedding☆2,780Mar 12, 2026Updated last week
- Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets☆4,896Updated this week
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆214Sep 18, 2025Updated 6 months ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆2,339Mar 9, 2026Updated last week
- Python library to use Pleias-RAG models☆68May 1, 2025Updated 10 months ago
- Curated list of datasets and tools for post-training.☆4,344Mar 9, 2026Updated last week
- Robust and fast topic models with sentence-transformers.☆95Updated this week
- ☆566Nov 20, 2024Updated last year
- SpanMarker for Named Entity Recognition☆465Jan 8, 2025Updated last year
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,604Dec 20, 2025Updated 3 months ago
- Baguetter is a flexible, efficient, and hackable search engine library implemented in Python. It's designed for quickly benchmarking, imp…☆209Aug 31, 2024Updated last year
- A flexible, adaptive classification system for dynamic text classification☆539Oct 7, 2025Updated 5 months ago
- ☆162Dec 2, 2024Updated last year
- Leveraging BERT and c-TF-IDF to create easily interpretable topics.☆7,452Feb 20, 2026Updated last month
- Lightweight hallucination detection framework for RAG applications☆533Mar 6, 2026Updated 2 weeks ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆81Feb 10, 2026Updated last month
- Generalist and Lightweight Model for Relation Extraction (Extract any relationship types from text)☆258Jun 11, 2025Updated 9 months ago
- Simply, faster, sentence-transformers☆144Aug 27, 2024Updated last year
- Active Learning for Text Classification in Python☆637Mar 8, 2026Updated last week
- ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)☆3,799Oct 14, 2025Updated 5 months ago
- Code for KaLM-Embedding models☆115Jun 30, 2025Updated 8 months ago
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…☆159Jul 14, 2025Updated 8 months ago