Fast Multimodal Semantic Deduplication & Filtering
☆909Jan 20, 2026Updated 2 months ago
Alternatives and similar repositories for semhash
Users that are interested in semhash are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Lightweight Nearest Neighbors with Flexible Backends☆336Updated this week
- Pre-train Static Word Embeddings☆98Mar 27, 2026Updated last week
- Fast State-of-the-Art Static Embeddings☆2,020Updated this week
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆3,155Mar 30, 2026Updated last week
- Late Interaction Models Training & Retrieval☆778Mar 6, 2026Updated last month
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Efficient few-shot learning with Sentence Transformers☆2,705Apr 2, 2026Updated last week
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024☆3,029Mar 31, 2026Updated last week
- High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datas…☆239Apr 1, 2026Updated last week
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,978Apr 2, 2026Updated last week
- Fast BM25 search in Python, powered by Numpy and Numba☆1,615Updated this week
- Fast search index for SPLADE sparse retrieval models implemented in Python using Numpy and Numba☆38Oct 16, 2025Updated 5 months ago
- Generalist and Lightweight Model for Text Classification☆206Feb 17, 2026Updated last month
- Build datasets using natural language☆573Sep 19, 2025Updated 6 months ago
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,897May 17, 2025Updated 10 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Structured Outputs☆13,631Mar 26, 2026Updated 2 weeks ago
- Bringing BERT into modernity via both architecture changes and scaling☆1,652Mar 1, 2026Updated last month
- Easily embed, cluster and semantically label text datasets☆602Mar 28, 2024Updated 2 years ago
- Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets☆4,925Updated this week
- Fast, Accurate, Lightweight Python library to make State of the Art Embedding☆2,834Mar 30, 2026Updated last week
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆2,364Apr 2, 2026Updated last week
- Robust and fast topic models with sentence-transformers.☆97Updated this week
- Curated list of datasets and tools for post-training.☆4,418Mar 9, 2026Updated last month
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆216Sep 18, 2025Updated 6 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Python library to use Pleias-RAG models☆71May 1, 2025Updated 11 months ago
- SpanMarker for Named Entity Recognition☆465Jan 8, 2025Updated last year
- ☆567Nov 20, 2024Updated last year
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,606Dec 20, 2025Updated 3 months ago
- Baguetter is a flexible, efficient, and hackable search engine library implemented in Python. It's designed for quickly benchmarking, imp…☆209Aug 31, 2024Updated last year
- ☆162Dec 2, 2024Updated last year
- A flexible, adaptive classification system for dynamic text classification☆545Oct 7, 2025Updated 6 months ago
- Leveraging BERT and c-TF-IDF to create easily interpretable topics.☆7,508Feb 20, 2026Updated last month
- Lightweight hallucination detection framework for RAG applications☆538Mar 6, 2026Updated last month
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆84Feb 10, 2026Updated last month
- Generalist and Lightweight Model for Relation Extraction (Extract any relationship types from text)☆263Mar 30, 2026Updated last week
- Simply, faster, sentence-transformers☆144Aug 27, 2024Updated last year
- Active Learning for Text Classification in Python☆637Apr 1, 2026Updated last week
- Everything about the SmolLM and SmolVLM family of models☆3,696Apr 2, 2026Updated last week
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…☆160Jul 14, 2025Updated 8 months ago
- ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)☆3,822Oct 14, 2025Updated 5 months ago