ayushgupta4897/fast-dedupe

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ayushgupta4897/fast-dedupe)

ayushgupta4897 / fast-dedupe

A minimalist but optimized Python package for deduplication tasks leveraging RapidFuzz internally, enabling super-fast approximate duplicate detection within a dataset with minimal config.

☆18

Alternatives and similar repositories for fast-dedupe

Users that are interested in fast-dedupe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ayushgupta4897 / embedDB
View on GitHub
EmbedDB is an ultra-lightweight vector database designed for rapid prototyping of semantic search and RAG applications. The entire implem…
☆21Mar 24, 2025Updated last year
TrevorW-code / fraud
View on GitHub
synthetic data for ml
☆25Jan 30, 2025Updated last year
huggingface / dataset-dedupe-estimator
View on GitHub
parquet dedupe estimator
☆27May 26, 2026Updated 2 months ago
circle-hit / MuCDN
View on GitHub
Code for COLING 2022 accepted paper titled "MuCDN: Mutual Conversational Detachment Network for Emotion Recognition in Multi-Party Conver…
☆10Jul 21, 2023Updated 3 years ago
kinesiatricssxilm14 / CodeRepoQA
View on GitHub
CodeRepoQA dataset
☆15Feb 19, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
ekshaks / ragpipe
View on GitHub
Iterate fast on your RAG pipelines
☆24Jun 21, 2025Updated last year
kyryl-opens-ml / fine-tune-llms-in-2024-with-trl
View on GitHub
☆12Apr 22, 2024Updated 2 years ago
jranaraki / vllm-tuner
View on GitHub
An intelligent tuner for vLLM that automatically monitors GPU metrics, uses Bayesian optimization to tune parameters
☆66Mar 12, 2026Updated 4 months ago
kislerdm / pyarch
View on GitHub
The tool to visualise architecture of python packages
☆10Aug 16, 2023Updated 2 years ago
veekaybee / blusky
View on GitHub
Playing with Python Bluesky SDK
☆15Nov 18, 2024Updated last year
brainstory / backend
View on GitHub
Notes on how to set up your backend instance
☆11May 29, 2024Updated 2 years ago
godatadriven / ducklake-blog-1
View on GitHub
Example files used in the DuckDB - Unity Catalog blog
☆10Dec 6, 2024Updated last year
UKPLab / EACL21-personalized-conversational-system
View on GitHub
☆12Nov 19, 2022Updated 3 years ago
Meesho / llm_calculator
View on GitHub
☆13Jul 17, 2026Updated last week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
taidopurason / tokenizer-extension
View on GitHub
☆15Dec 4, 2025Updated 7 months ago
aws-samples / automl-pipeline-with-autogluon-sagemaker-lambda
View on GitHub
A code-free AutoML pipeline with AutoGluon, Amazon SageMaker, and AWS Lambda.
☆11Aug 5, 2021Updated 4 years ago
alexkolo / rag_nutrition_facts_blog
View on GitHub
RAG-based Chatbot that helps answer questions around healthy eating & lifestyle choices, based on 1200+ science-backed blog posts of Nutr…
☆15Sep 15, 2025Updated 10 months ago
aiplaybookin / gradio-demo
View on GitHub
Examples of demo deployment using Gradio. Image Classification, Live Webcam Segmentation, APIs , Tunneling etc.
☆17Oct 17, 2022Updated 3 years ago
pavanjava / qql
View on GitHub
SQL-like query language and CLI for Qdrant vector search engine
☆46Jun 13, 2026Updated last month
econcarol / ISLR
View on GitHub
R and Python solutions to applied exercises in An Introduction to Statistical Learning with Applications in R (corrected 7th printing)
☆15Jun 4, 2025Updated last year
business-science / lab_59_cust_lifetime_py
View on GitHub
Learning Lab 59: Customer Lifetime Value Python
☆14Mar 26, 2024Updated 2 years ago
wjbmattingly / youtube-florence-table
View on GitHub
Table detection with Florence.
☆15Jul 11, 2024Updated 2 years ago
ThinamXx / cuda-mode
View on GitHub
Making of cuda kernel
☆17May 27, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Joinn99 / RocketEval-ICLR
View on GitHub
🚀 [ICLR '25] RocketEval: Efficient Automated LLM Evaluation via Grading Checklist
☆17Aug 21, 2025Updated 11 months ago
srivatsan88 / sector
View on GitHub
☆18Dec 6, 2024Updated last year
pik1989 / FBProphet
View on GitHub
☆12Dec 29, 2021Updated 4 years ago
alexeygrigorev / pocketshell
View on GitHub
Voice-first, tmux-native, agent-aware Android SSH client
☆21Updated this week
Logisx / AI-Senior
View on GitHub
🤖 AI Assistant fine-tuned to provide support for coding and design questions based on the latest trends in the industry.
☆17Jan 14, 2024Updated 2 years ago
ing-bank / spark-matcher
View on GitHub
Record matching and entity resolution at scale in Spark
☆36Oct 31, 2023Updated 2 years ago
pik1989 / Guide-on-Time-Series-Analysis-using-ARIMA-LSTM-fbProphet
View on GitHub
Time Series Forecasting Problem
☆19May 9, 2020Updated 6 years ago
ansarifaisal12 / Agent_Mont
View on GitHub
Comprehensive metrics, insights, and visualization for Agno and Crew AI applications
☆26May 21, 2025Updated last year
DrGabrielHarris / data-science-repo-template
View on GitHub
A repository template using Poetry, Makefile, and pre-commit-hooks
☆22Nov 17, 2022Updated 3 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
dssg / pgdedupe
View on GitHub
A simple command line interface to the datamade/dedupe library.
☆43Dec 26, 2022Updated 3 years ago
AshwinSankar17 / intro-to-tts
View on GitHub
A notebooks based (soft) intro to modern TTS
☆18Jun 8, 2025Updated last year
NotShrirang / tinygpt
View on GitHub
🎈 A series of lightweight GPT models featuring TinyGPT Base (~51M params) and TinyGPT2 (~95M params). Fast, creative text generation tra…
☆17Jun 19, 2026Updated last month
mburaksayici / smallevals
View on GitHub
smallevals — CPU-fast, GPU-blazing fast offline retrieval evaluation for RAG systems with tiny QA models.
☆22Dec 4, 2025Updated 7 months ago
MaLA-LM / GlotEval
View on GitHub
GlotEval: a unified evaluation toolkit designed to benchmark multilingual Large Language Models (LLMs) in a language-specific way
☆18Nov 4, 2025Updated 8 months ago
OlivierBinette / er-evaluation
View on GitHub
An End-to-End Evaluation Framework for Entity Resolution Systems
☆38Dec 3, 2023Updated 2 years ago
vectara / FaithBench
View on GitHub
☆16May 12, 2025Updated last year