ayushgupta4897 / fast-dedupeView external linksLinks
A minimalist but optimized Python package for deduplication tasks leveraging RapidFuzz internally, enabling super-fast approximate duplicate detection within a dataset with minimal config.
☆18Apr 2, 2025Updated 10 months ago
Alternatives and similar repositories for fast-dedupe
Users that are interested in fast-dedupe are comparing it to the libraries listed below
Sorting:
- EmbedDB is an ultra-lightweight vector database designed for rapid prototyping of semantic search and RAG applications. The entire implem…☆21Mar 24, 2025Updated 10 months ago
- synthetic data for ml☆25Jan 30, 2025Updated last year
- ☆10Nov 12, 2024Updated last year
- Code for COLING 2022 accepted paper titled "MuCDN: Mutual Conversational Detachment Network for Emotion Recognition in Multi-Party Conver…☆10Jul 21, 2023Updated 2 years ago
- ☆13Sep 23, 2025Updated 4 months ago
- Multi-Agent Deep RAG☆33Feb 8, 2026Updated last week
- ☆11Jul 30, 2025Updated 6 months ago
- Learning Lab 59: Customer Lifetime Value Python☆14Mar 26, 2024Updated last year
- 🚀 [ICLR '25] RocketEval: Efficient Automated LLM Evaluation via Grading Checklist☆15Aug 21, 2025Updated 5 months ago
- RAG-based Chatbot that helps answer questions around healthy eating & lifestyle choices, based on 1200+ science-backed blog posts of Nutr…☆13Sep 15, 2025Updated 5 months ago
- Course Scheduling Management LMS - Low level design with standard design patterns using Java.☆11Jul 27, 2022Updated 3 years ago
- ☆25Updated this week
- DRL-Cache is an NGINX dynamic module + a tiny inference sidecar. When space is needed, the module asks a 4–8 KB dueling-DQN policy which …☆34Aug 17, 2025Updated 5 months ago
- R and Python solutions to applied exercises in An Introduction to Statistical Learning with Applications in R (corrected 7th printing)☆16Jun 4, 2025Updated 8 months ago
- CodeRepoQA dataset☆15Feb 19, 2025Updated 11 months ago
- ☆11Dec 22, 2022Updated 3 years ago
- ☆12Apr 22, 2024Updated last year
- ☆12Dec 29, 2021Updated 4 years ago
- Official repository of the Manning book - Fight Fraud with Machine Learning - by Ashish Ranjan Jha☆18May 24, 2025Updated 8 months ago
- Apache Arrow Guide☆17Oct 10, 2021Updated 4 years ago
- GlotEval: a unified evaluation toolkit designed to benchmark multilingual Large Language Models (LLMs) in a language-specific way☆18Nov 4, 2025Updated 3 months ago
- In this course you'll learn to use Gradio to create user-friendly apps with minimal code: Summarize text using a large language model, ge…☆14Nov 15, 2023Updated 2 years ago
- ☆13Nov 19, 2022Updated 3 years ago
- ☆17Apr 19, 2024Updated last year
- ☆19Oct 1, 2025Updated 4 months ago
- Time Series Forecasting Problem☆19May 9, 2020Updated 5 years ago
- This is the code for our ACL 2021 paper entitled eMLM: A New Pre-training Objective for Emotion Related Tasks☆15Sep 7, 2022Updated 3 years ago
- Resources to learn data processing with GPT and other language models☆21Dec 10, 2024Updated last year
- Examples of demo deployment using Gradio. Image Classification, Live Webcam Segmentation, APIs , Tunneling etc.☆17Oct 17, 2022Updated 3 years ago
- ☆22Jan 13, 2025Updated last year
- Table detection with Florence.☆15Jul 11, 2024Updated last year
- Playing with Python Bluesky SDK☆15Nov 18, 2024Updated last year
- Making of cuda kernel☆17May 27, 2025Updated 8 months ago
- 🤖 AI Assistant fine-tuned to provide support for coding and design questions based on the latest trends in the industry.☆17Jan 14, 2024Updated 2 years ago
- Python implementation of METEOR☆15Nov 20, 2018Updated 7 years ago
- ☆25Jun 10, 2025Updated 8 months ago
- Spark Calibration - A python package for calibrating probabilities predicted by ML model involving large training & test datasets as spar…☆18Dec 10, 2025Updated 2 months ago
- Structured pruning and bias visualization for Large Language Models. Tools for LLM optimization and fairness analysis.☆28Feb 4, 2026Updated last week
- ☆22Oct 22, 2023Updated 2 years ago