A minimalist but optimized Python package for deduplication tasks leveraging RapidFuzz internally, enabling super-fast approximate duplicate detection within a dataset with minimal config.
☆18Apr 2, 2025Updated last year
Alternatives and similar repositories for fast-dedupe
Users that are interested in fast-dedupe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- EmbedDB is an ultra-lightweight vector database designed for rapid prototyping of semantic search and RAG applications. The entire implem…☆21Mar 24, 2025Updated last year
- synthetic data for ml☆25Jan 30, 2025Updated last year
- Fast search index for SPLADE sparse retrieval models implemented in Python using Numpy and Numba☆38Oct 16, 2025Updated 8 months ago
- ☆12Apr 22, 2024Updated 2 years ago
- Example files used in the DuckDB - Unity Catalog blog☆10Dec 6, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆22Jan 13, 2025Updated last year
- CodeRepoQA dataset☆15Feb 19, 2025Updated last year
- The tool to visualise architecture of python packages☆10Aug 16, 2023Updated 2 years ago
- ☆11Dec 22, 2022Updated 3 years ago
- Iterate fast on your RAG pipelines☆24Jun 21, 2025Updated 11 months ago
- Notes on how to set up your backend instance☆11May 29, 2024Updated 2 years ago
- Open-source web scraping API. Turn any website into clean markdown or structured JSON. Anti-detect browser, proxy auto-selection, self-ho…☆150Updated this week
- ☆13May 26, 2026Updated 3 weeks ago
- ☆21Jun 12, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Playing with Python Bluesky SDK☆15Nov 18, 2024Updated last year
- ☆13Nov 19, 2022Updated 3 years ago
- Model data exports for Django☆38Nov 23, 2021Updated 4 years ago
- LUMIN: Your data analysis companion that turns natural language questions into powerful insights through AI-driven visualizations and cle…☆19Nov 11, 2024Updated last year
- 🎈 A series of lightweight GPT models featuring TinyGPT Base (~51M params) and TinyGPT2 (~95M params). Fast, creative text generation tra…☆17Apr 17, 2026Updated 2 months ago
- Apache Arrow Guide☆17Oct 10, 2021Updated 4 years ago
- Examples of demo deployment using Gradio. Image Classification, Live Webcam Segmentation, APIs , Tunneling etc.☆17Oct 17, 2022Updated 3 years ago
- ☆26Jun 10, 2025Updated last year
- ☆19Oct 1, 2025Updated 8 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Learning Lab 59: Customer Lifetime Value Python☆14Mar 26, 2024Updated 2 years ago
- Python SDK for dataset generation on LightningRod platform ⚡☆51Updated this week
- Table detection with Florence.☆15Jul 11, 2024Updated last year
- Resources to learn data processing with GPT and other language models☆21Dec 10, 2024Updated last year
- Making of cuda kernel☆17May 27, 2025Updated last year
- Structured pruning and bias visualization for Large Language Models. Tools for LLM optimization and fairness analysis.☆41May 31, 2026Updated 2 weeks ago
- ☆18Dec 6, 2024Updated last year
- Cybersecurity skills for AI coding agents (Claude Code, Cursor, Codex)☆251May 27, 2026Updated 3 weeks ago
- Python implementation of METEOR☆17Nov 20, 2018Updated 7 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- 🤖 AI Assistant fine-tuned to provide support for coding and design questions based on the latest trends in the industry.☆17Jan 14, 2024Updated 2 years ago
- Course Scheduling Management LMS - Low level design with standard design patterns using Java.☆11Jul 27, 2022Updated 3 years ago
- R and Python solutions to applied exercises in An Introduction to Statistical Learning with Applications in R (corrected 7th printing)☆16Jun 4, 2025Updated last year
- Comprehensive metrics, insights, and visualization for Agno and Crew AI applications☆26May 21, 2025Updated last year
- ☆30Apr 14, 2025Updated last year
- SynthTextEval: A Toolkit for Generating and Evaluating Synthetic Data For High-Stakes Domains (EMNLP 2025 System Demonstration)☆27Nov 3, 2025Updated 7 months ago
- A python implementation of the Ensemble Biclustering for Classification (EBC) algorithm. EBC is a co-clustering algorithm that allows you…☆20Apr 7, 2017Updated 9 years ago