A minimalist but optimized Python package for deduplication tasks leveraging RapidFuzz internally, enabling super-fast approximate duplicate detection within a dataset with minimal config.
☆18Apr 2, 2025Updated 11 months ago
Alternatives and similar repositories for fast-dedupe
Users that are interested in fast-dedupe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- EmbedDB is an ultra-lightweight vector database designed for rapid prototyping of semantic search and RAG applications. The entire implem…☆21Mar 24, 2025Updated last year
- synthetic data for ml☆25Jan 30, 2025Updated last year
- Fast search index for SPLADE sparse retrieval models implemented in Python using Numpy and Numba☆37Oct 16, 2025Updated 5 months ago
- ☆10Nov 12, 2024Updated last year
- ☆27Feb 11, 2026Updated last month
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Template for building a Singer Target☆20Sep 3, 2024Updated last year
- ☆22Jun 5, 2025Updated 9 months ago
- ☆12Apr 22, 2024Updated last year
- ☆15May 12, 2025Updated 10 months ago
- 🚀 [ICLR '25] RocketEval: Efficient Automated LLM Evaluation via Grading Checklist☆15Aug 21, 2025Updated 7 months ago
- GlotEval: a unified evaluation toolkit designed to benchmark multilingual Large Language Models (LLMs) in a language-specific way☆18Nov 4, 2025Updated 4 months ago
- ☆22Jan 13, 2025Updated last year
- An AI-powered literature review assistant for researchers☆25Apr 18, 2025Updated 11 months ago
- A code-free AutoML pipeline with AutoGluon, Amazon SageMaker, and AWS Lambda.☆11Aug 5, 2021Updated 4 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- ☆11Dec 22, 2022Updated 3 years ago
- The tool to visualise architecture of python packages☆10Aug 16, 2023Updated 2 years ago
- Notes on how to set up your backend instance☆12May 29, 2024Updated last year
- 🎈 A series of lightweight GPT models featuring TinyGPT Base (~51M params) and TinyGPT2 (~95M params). Fast, creative text generation tra…☆16Mar 9, 2026Updated 2 weeks ago
- ☆13Sep 23, 2025Updated 6 months ago
- ☆13Nov 19, 2022Updated 3 years ago
- Apache Arrow Guide☆17Oct 10, 2021Updated 4 years ago
- RAG-based Chatbot that helps answer questions around healthy eating & lifestyle choices, based on 1200+ science-backed blog posts of Nutr…☆13Sep 15, 2025Updated 6 months ago
- Examples of demo deployment using Gradio. Image Classification, Live Webcam Segmentation, APIs , Tunneling etc.☆17Oct 17, 2022Updated 3 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆25Jun 10, 2025Updated 9 months ago
- Structured pruning and bias visualization for Large Language Models. Tools for LLM optimization and fairness analysis.☆29Mar 14, 2026Updated 2 weeks ago
- ☆19Oct 1, 2025Updated 5 months ago
- Python SDK for dataset generation on LightningRod platform ⚡☆37Mar 20, 2026Updated last week
- Table detection with Florence.☆15Jul 11, 2024Updated last year
- Making of cuda kernel☆16May 27, 2025Updated 10 months ago
- A repository template using Poetry, Makefile, and pre-commit-hooks☆22Nov 17, 2022Updated 3 years ago
- ☆18Dec 6, 2024Updated last year
- ☆13Feb 24, 2026Updated last month
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Record matching and entity resolution at scale in Spark☆36Oct 31, 2023Updated 2 years ago
- In this course you'll learn to use Gradio to create user-friendly apps with minimal code: Summarize text using a large language model, ge…☆14Nov 15, 2023Updated 2 years ago
- Autogluon-cloud aims to provide user tools to train, fine-tune and deploy AutoGluon backed models on the cloud. With just a few lines of …☆24Nov 24, 2025Updated 4 months ago
- Course Scheduling Management LMS - Low level design with standard design patterns using Java.☆11Jul 27, 2022Updated 3 years ago
- Time Series Forecasting Problem☆19May 9, 2020Updated 5 years ago
- R and Python solutions to applied exercises in An Introduction to Statistical Learning with Applications in R (corrected 7th printing)☆16Jun 4, 2025Updated 9 months ago
- Codebase for running (conditional) probing experiments☆21Nov 13, 2022Updated 3 years ago