ayushgupta4897 / fast-dedupeLinks
A minimalist but optimized Python package for deduplication tasks leveraging RapidFuzz internally, enabling super-fast approximate duplicate detection within a dataset with minimal config.
☆17Updated 7 months ago
Alternatives and similar repositories for fast-dedupe
Users that are interested in fast-dedupe are comparing it to the libraries listed below
Sorting:
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da☆117Updated 7 months ago
- this project will bootstrap and scaffold the projects for specific semantic search and RAG applications along with regular boiler plate c…☆92Updated 10 months ago
- Research repository on interfacing LLMs with Weaviate APIs. Inspired by the Berkeley Gorilla LLM.☆136Updated 2 months ago
- synthetic data for ml☆25Updated 9 months ago
- ☆36Updated 11 months ago
- Multimodal AI workloads: batch inference, model training and online serving.☆73Updated 2 months ago
- Lite weight wrapper for the independent implementation of SPLADE++ models for search & retrieval pipelines. Models and Library created by…☆33Updated last year
- A Lightweight Library for AI Observability☆251Updated 8 months ago
- Simple UI for debugging correlations of text embeddings☆298Updated 5 months ago
- This repository contains various RAG patterns implemented from scratch☆20Updated last week
- Comprehensive Vector Data Tooling. The universal interface for all vector database, datasets and RAG platforms. Easily export, import, ba…☆263Updated last week
- Named Entity Recognition using Claude Citations☆79Updated 5 months ago
- Tool to migrate data into Qdrant☆59Updated last week
- SUQL: Conversational Search over Structured and Unstructured Data with LLMs☆290Updated last week
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆179Updated last year
- Iterate fast on your RAG pipelines☆23Updated 4 months ago
- ☆210Updated 4 months ago
- Kura is a simple reproduction of the CLIO paper which uses language models to label user behaviour before clustering them based on embedd…☆357Updated 2 months ago
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. 👨🏻🍳☆338Updated 5 months ago
- How far can we go with an LLM for a classification problem☆24Updated 11 months ago
- An open-source tool for LLM prompt optimization.☆698Updated this week
- Deep Research for your internal data☆346Updated 5 months ago
- High-Performance Engine for Multi-Vector Search☆180Updated last week
- This repo is the central repo for all the RAG Evaluation reference material and partner workshop☆76Updated 6 months ago
- A curated list of open source repositories for AI Engineers☆119Updated 7 months ago
- Ranking LLMs on agentic tasks☆198Updated last month
- Tutorial for building LLM router☆233Updated last year
- Generalist and Lightweight Model for Text Classification☆164Updated 4 months ago
- ☆211Updated 5 months ago
- ☆146Updated last year