A minimalist but optimized Python package for deduplication tasks leveraging RapidFuzz internally, enabling super-fast approximate duplicate detection within a dataset with minimal config.
☆18Apr 2, 2025Updated last year
Alternatives and similar repositories for fast-dedupe
Users that are interested in fast-dedupe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- EmbedDB is an ultra-lightweight vector database designed for rapid prototyping of semantic search and RAG applications. The entire implem…☆21Mar 24, 2025Updated last year
- The DistanceMetrics package is a comprehensive Python library designed to compute a wide variety of distance metrics between two vectors,…☆16Sep 25, 2025Updated 7 months ago
- synthetic data for ml☆25Jan 30, 2025Updated last year
- ☆28Feb 11, 2026Updated 2 months ago
- 3D Gaussian Splatting Viewer☆32Mar 7, 2026Updated 2 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- 🚀 [ICLR '25] RocketEval: Efficient Automated LLM Evaluation via Grading Checklist☆16Aug 21, 2025Updated 8 months ago
- ☆12Apr 22, 2024Updated 2 years ago
- ☆14May 12, 2025Updated 11 months ago
- GlotEval: a unified evaluation toolkit designed to benchmark multilingual Large Language Models (LLMs) in a language-specific way☆18Nov 4, 2025Updated 6 months ago
- ☆22Jan 13, 2025Updated last year
- A code-free AutoML pipeline with AutoGluon, Amazon SageMaker, and AWS Lambda.☆11Aug 5, 2021Updated 4 years ago
- The tool to visualise architecture of python packages☆10Aug 16, 2023Updated 2 years ago
- ☆11Dec 22, 2022Updated 3 years ago
- ☆13Updated this week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆21Jun 12, 2024Updated last year
- ☆13Nov 19, 2022Updated 3 years ago
- 🎈 A series of lightweight GPT models featuring TinyGPT Base (~51M params) and TinyGPT2 (~95M params). Fast, creative text generation tra…☆17Apr 17, 2026Updated 3 weeks ago
- An AI-powered literature review assistant for researchers☆32Apr 18, 2025Updated last year
- This repository contains code for the paper RMM: A Recursive Mental Model for Dialog Navigation☆10Nov 22, 2022Updated 3 years ago
- Apache Arrow Guide☆17Oct 10, 2021Updated 4 years ago
- Official repository of the Manning book - Fight Fraud with Machine Learning - by Ashish Ranjan Jha☆19May 24, 2025Updated 11 months ago
- RAG-based Chatbot that helps answer questions around healthy eating & lifestyle choices, based on 1200+ science-backed blog posts of Nutr…☆13Sep 15, 2025Updated 7 months ago
- Dataset of sentences from Hindi stories tagged with different emotion tags☆11Nov 26, 2019Updated 6 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Examples of demo deployment using Gradio. Image Classification, Live Webcam Segmentation, APIs , Tunneling etc.☆17Oct 17, 2022Updated 3 years ago
- ☆19Oct 1, 2025Updated 7 months ago
- Learning Lab 59: Customer Lifetime Value Python☆14Mar 26, 2024Updated 2 years ago
- Python SDK for dataset generation on LightningRod platform ⚡☆44May 1, 2026Updated last week
- Resources to learn data processing with GPT and other language models☆21Dec 10, 2024Updated last year
- Making of cuda kernel☆16May 27, 2025Updated 11 months ago
- Structured pruning and bias visualization for Large Language Models. Tools for LLM optimization and fairness analysis.☆38Apr 19, 2026Updated 2 weeks ago
- This is the repo for the LegalBench-RAG Paper: https://arxiv.org/abs/2408.10343.☆182May 30, 2025Updated 11 months ago
- ☆13Feb 24, 2026Updated 2 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Record matching and entity resolution at scale in Spark☆36Oct 31, 2023Updated 2 years ago
- In this course you'll learn to use Gradio to create user-friendly apps with minimal code: Summarize text using a large language model, ge…☆14Nov 15, 2023Updated 2 years ago
- opensource NPU for LLM inference (this run gpt2)☆78Feb 16, 2026Updated 2 months ago
- 🤖 AI Assistant fine-tuned to provide support for coding and design questions based on the latest trends in the industry.☆17Jan 14, 2024Updated 2 years ago
- This is the code for our ACL 2021 paper entitled eMLM: A New Pre-training Objective for Emotion Related Tasks☆15Sep 7, 2022Updated 3 years ago
- Autogluon-cloud aims to provide user tools to train, fine-tune and deploy AutoGluon backed models on the cloud. With just a few lines of …☆24Nov 24, 2025Updated 5 months ago
- Course Scheduling Management LMS - Low level design with standard design patterns using Java.☆11Jul 27, 2022Updated 3 years ago