A minimalist but optimized Python package for deduplication tasks leveraging RapidFuzz internally, enabling super-fast approximate duplicate detection within a dataset with minimal config.
☆18Apr 2, 2025Updated last year
Alternatives and similar repositories for fast-dedupe
Users that are interested in fast-dedupe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- EmbedDB is an ultra-lightweight vector database designed for rapid prototyping of semantic search and RAG applications. The entire implem…☆21Mar 24, 2025Updated last year
- synthetic data for ml☆25Jan 30, 2025Updated last year
- Fast search index for SPLADE sparse retrieval models implemented in Python using Numpy and Numba☆38Oct 16, 2025Updated 7 months ago
- ☆10Nov 12, 2024Updated last year
- ☆28Feb 11, 2026Updated 3 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆22Jun 5, 2025Updated 11 months ago
- 3D Gaussian Splatting Viewer☆32Mar 7, 2026Updated 2 months ago
- 🚀 [ICLR '25] RocketEval: Efficient Automated LLM Evaluation via Grading Checklist☆16Aug 21, 2025Updated 9 months ago
- ☆12Apr 22, 2024Updated 2 years ago
- GlotEval: a unified evaluation toolkit designed to benchmark multilingual Large Language Models (LLMs) in a language-specific way☆18Nov 4, 2025Updated 6 months ago
- Example files used in the DuckDB - Unity Catalog blog☆10Dec 6, 2024Updated last year
- ☆22Jan 13, 2025Updated last year
- Cybersecurity skills for AI coding agents (Claude Code, Cursor, Codex)☆182Mar 13, 2026Updated 2 months ago
- CodeRepoQA dataset☆15Feb 19, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- The tool to visualise architecture of python packages☆10Aug 16, 2023Updated 2 years ago
- ☆11Dec 22, 2022Updated 3 years ago
- Open-source web scraping API. Turn any website into clean markdown or structured JSON. Anti-detect browser, proxy auto-selection, self-ho…☆102May 6, 2026Updated 3 weeks ago
- ☆21Jun 12, 2024Updated last year
- ☆13Nov 19, 2022Updated 3 years ago
- LUMIN: Your data analysis companion that turns natural language questions into powerful insights through AI-driven visualizations and cle…☆19Nov 11, 2024Updated last year
- 🎈 A series of lightweight GPT models featuring TinyGPT Base (~51M params) and TinyGPT2 (~95M params). Fast, creative text generation tra…☆17Apr 17, 2026Updated last month
- An AI-powered literature review assistant for researchers☆36May 7, 2026Updated 3 weeks ago
- Apache Arrow Guide☆17Oct 10, 2021Updated 4 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Turn your meeting transcripts into a Wikipedia for your company.☆42Apr 11, 2026Updated last month
- Examples of demo deployment using Gradio. Image Classification, Live Webcam Segmentation, APIs , Tunneling etc.☆17Oct 17, 2022Updated 3 years ago
- ☆26Jun 10, 2025Updated 11 months ago
- ☆19Oct 1, 2025Updated 7 months ago
- Learning Lab 59: Customer Lifetime Value Python☆14Mar 26, 2024Updated 2 years ago
- Python SDK for dataset generation on LightningRod platform ⚡☆47May 18, 2026Updated last week
- Table detection with Florence.☆15Jul 11, 2024Updated last year
- Resources to learn data processing with GPT and other language models☆21Dec 10, 2024Updated last year
- Making of cuda kernel☆16May 27, 2025Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Structured pruning and bias visualization for Large Language Models. Tools for LLM optimization and fairness analysis.☆40May 16, 2026Updated last week
- ☆18Dec 6, 2024Updated last year
- This is the repo for the LegalBench-RAG Paper: https://arxiv.org/abs/2408.10343.☆188May 30, 2025Updated 11 months ago
- ☆13Feb 24, 2026Updated 3 months ago
- Python implementation of METEOR☆16Nov 20, 2018Updated 7 years ago
- In this course you'll learn to use Gradio to create user-friendly apps with minimal code: Summarize text using a large language model, ge…☆14Nov 15, 2023Updated 2 years ago
- 🤖 AI Assistant fine-tuned to provide support for coding and design questions based on the latest trends in the industry.☆17Jan 14, 2024Updated 2 years ago