ayushgupta4897 / fast-dedupeLinks
A minimalist but optimized Python package for deduplication tasks leveraging RapidFuzz internally, enabling super-fast approximate duplicate detection within a dataset with minimal config.
☆18Updated 2 months ago
Alternatives and similar repositories for fast-dedupe
Users that are interested in fast-dedupe are comparing it to the libraries listed below
Sorting:
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da☆105Updated 2 months ago
- A curated list of open source repositories for AI Engineers☆113Updated 2 months ago
- Framework for building data agent workflows☆82Updated 9 months ago
- Solving data for LLMs - Create quality synthetic datasets!☆148Updated 4 months ago
- Terminal-based AI Coding Agent, similar to Claude Code, OpenAI Codex etc. but works with many more LLMs e.g. Gemini, Groq, Deepseek☆130Updated last month
- ☆33Updated 6 months ago
- this project will bootstrap and scaffold the projects for specific semantic search and RAG applications along with regular boiler plate c…☆89Updated 5 months ago
- Iterate fast on your RAG pipelines☆23Updated 3 months ago
- Named Entity Recognition using Claude Citations☆74Updated 2 months ago
- Deep Research for your internal data☆322Updated last week
- Comprehensive Vector Data Tooling. The universal interface for all vector database, datasets and RAG platforms. Easily export, import, ba…☆243Updated this week
- A Lightweight Library for AI Observability☆243Updated 3 months ago
- Research repository on interfacing LLMs with Weaviate APIs. Inspired by the Berkeley Gorilla LLM.☆132Updated last month
- ☆36Updated 5 months ago
- ☆27Updated 4 months ago
- ☆210Updated 11 months ago
- Together Open Deep Research☆308Updated last month
- A practical RAG where you can download and chat with github repo☆80Updated 3 months ago
- XTR/WARP (SIGIR'25) is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.☆131Updated last month
- Lightweight Non-Parametric Embedding Fine-Tuning☆25Updated 8 months ago
- ☆122Updated 3 months ago
- Generalist and Lightweight Model for Text Classification☆128Updated 2 weeks ago
- 🦄 ai that works - every tuesday 10 AM PST☆97Updated this week
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models☆105Updated last month
- A framework for fine-tuning retrieval-augmented generation (RAG) systems.☆87Updated this week
- ☆34Updated last month
- A reimplementation of langgraph's customer support example in Rasa's CALM paradigm and a quantiative evaluation of the 2 approaches☆80Updated 2 months ago
- Lite weight wrapper for the independent implementation of SPLADE++ models for search & retrieval pipelines. Models and Library created by…☆31Updated 9 months ago
- A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use…☆118Updated 3 weeks ago
- Simple UI for debugging correlations of text embeddings☆256Updated last week