ayushgupta4897 / fast-dedupeLinks
A minimalist but optimized Python package for deduplication tasks leveraging RapidFuzz internally, enabling super-fast approximate duplicate detection within a dataset with minimal config.
☆18Updated 8 months ago
Alternatives and similar repositories for fast-dedupe
Users that are interested in fast-dedupe are comparing it to the libraries listed below
Sorting:
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da☆119Updated 8 months ago
- ☆92Updated 2 months ago
- synthetic data for ml☆25Updated 11 months ago
- This repository contains various RAG patterns implemented from scratch☆20Updated 2 weeks ago
- Multimodal AI workloads: batch inference, model training and online serving.☆105Updated 4 months ago
- this project will bootstrap and scaffold the projects for specific semantic search and RAG applications along with regular boiler plate c…☆91Updated last year
- LitePali is a minimal, efficient implementation of ColPali for image retrieval and indexing, optimized for cloud deployment.☆64Updated last year
- Generalist and Lightweight Model for Text Classification☆167Updated 3 weeks ago
- ☆37Updated last year
- Simple UI for debugging correlations of text embeddings☆306Updated 7 months ago
- ☆18Updated last year
- Comprehensive Vector Data Tooling. The universal interface for all vector database, datasets and RAG platforms. Easily export, import, ba…☆264Updated last week
- A curated list of open source repositories for AI Engineers☆123Updated 9 months ago
- Lite weight wrapper for the independent implementation of SPLADE++ models for search & retrieval pipelines. Models and Library created by…☆33Updated last year
- Named Entity Recognition using Claude Citations☆79Updated 6 months ago
- Research repository on interfacing LLMs with Weaviate APIs. Inspired by the Berkeley Gorilla LLM.☆140Updated 4 months ago
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆181Updated last year
- Optimized Large Language Models for Financial Applications – Efficient, Scalable, and Domain-Specific AI for Finance.☆50Updated 5 months ago
- A compute framework for building Search, RAG, Recommendations and Analytics over complex (structured+unstructured) data, with ultra-modal…☆12Updated last year
- How to build the best search, one step at a time!☆231Updated last month
- 🎨 NeMo Data Designer: A general library for generating high-quality synthetic data from scratch or based on seed data.☆547Updated last week
- Fine-tune an LLM to perform batch inference and online serving.☆115Updated 7 months ago
- Materials for the Ultimate Hybrid Search Workshop☆44Updated last year
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. 👨🏻🍳☆348Updated 6 months ago
- Official Implementation of "Affordable AI Assistants with Knowledge Graph of Thoughts"☆201Updated this week
- Unified Schema-Based Information Extraction☆403Updated this week
- Generalist and Lightweight Model for Relation Extraction (Extract any relationship types from text)☆253Updated 6 months ago
- Tool to migrate data into Qdrant☆63Updated this week
- A reimplementation of langgraph's customer support example in Rasa's CALM paradigm and a quantiative evaluation of the 2 approaches☆81Updated 9 months ago
- ☆212Updated 6 months ago