MinishLab / semhash
Fast Semantic Text Deduplication
β472Updated this week
Alternatives and similar repositories for semhash:
Users that are interested in semhash are comparing it to the libraries listed below
- The Fastest State-of-the-Art Static Embeddings in the Worldβ954Updated this week
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. π¨π»βπ³β251Updated last month
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipyβ990Updated 2 weeks ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.β1,261Updated last week
- Baguetter is a flexible, efficient, and hackable search engine library implemented in Python. It's designed for quickly benchmarking, impβ¦β170Updated 5 months ago
- Code for explaining and evaluating late chunking (chunked pooling)β314Updated last month
- Easily embed, cluster and semantically label text datasetsβ494Updated 10 months ago
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.β235Updated last week
- Late Interaction Models Training & Retrievalβ229Updated this week
- β201Updated last month
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.β707Updated this week
- A Lightweight Library for AI Observabilityβ232Updated this week
- A prompting libraryβ154Updated 4 months ago
- awesome synthetic (text) datasetsβ256Updated 3 months ago
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.β401Updated 11 months ago
- Lightweight Nearest Neighbors with Flexible Backendsβ229Updated this week
- FastFit β‘ When LLMs are Unfit Use FastFit β‘ Fast and Effective Text Classification with Many Classesβ182Updated 3 months ago
- Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has stβ¦β220Updated 2 weeks ago
- β207Updated 6 months ago
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on taskβ¦β145Updated 4 months ago
- Build datasets using natural languageβ310Updated this week
- Neural Searchβ349Updated 7 months ago
- Unattended Lightweight Text Classifiers with LLM Embeddingsβ182Updated 4 months ago
- Framework for enhancing LLMs for RAG tasks using fine-tuning.β523Updated last month
- Synthetic Data curation for post-training and structured data extractionβ575Updated this week
- β666Updated this week
- An enterprise-grade AI retriever designed to streamline AI integration into your applications, ensuring cutting-edge accuracy.β279Updated this week
- π Process PDFs, Word documents and more with spaCyβ346Updated last month
- clean & curate your data with LLMs.β473Updated 7 months ago
- β147Updated last month