Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.
☆19Aug 28, 2023Updated 2 years ago
Alternatives and similar repositories for deduplication
Users that are interested in deduplication are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Find near-duplicate documents using minhashing implemented in Go.☆16Dec 22, 2015Updated 10 years ago
- ☆39Jul 28, 2023Updated 2 years ago
- RapidCDC: Leveraging Duplicate Locality to Accelerate Chunking in CDC-based Deduplication Systems☆17May 25, 2020Updated 5 years ago
- Compute statistics on git repositories☆10May 29, 2019Updated 6 years ago
- Tool to detect (and get rid of) similar images using perceptual hashing (pHash lib)☆84Nov 6, 2016Updated 9 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Fast duplicate file detection library☆26Jan 5, 2017Updated 9 years ago
- Visual hashes☆26Mar 21, 2017Updated 9 years ago
- Create and access rar archives.☆16May 10, 2016Updated 9 years ago
- 🕹️ Group and deduplicate concurrent tasks☆29Apr 1, 2026Updated last week
- Wrapper for mirscreencast and ffmpeg to record Unity 8 desktop videos.☆13Oct 30, 2016Updated 9 years ago
- AVR-based monitoring of home electricity consumption☆15Jan 4, 2026Updated 3 months ago
- Python library and dashboard for hyperparameter search and model training for computer vision tasks based on PyTorch, Optuna, FiftyOne, D…☆17Jul 14, 2023Updated 2 years ago
- Variable-sized block deduplication archival backed by Plan9's venti☆17Jul 15, 2024Updated last year
- 📺 Self-hosted, open source YouTube subscription management system☆11Apr 2, 2024Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Multiple ways of chunking for data deduplication: Fixed size chunking, Content defined chunking, and File based chunking.☆19Dec 20, 2013Updated 12 years ago
- .foos for foos & more☆22Jun 14, 2023Updated 2 years ago
- Copilot with deepseek and more...☆13Mar 7, 2025Updated last year
- FastCDC implementation in Python https://pypi.org/project/fastcdc/☆63Jun 27, 2024Updated last year
- ☆10Jun 22, 2020Updated 5 years ago
- A Golang package that implements CDC chunkers with a generic interface☆122Jan 22, 2026Updated 2 months ago
- RAG-Fusion implementation using Langchain, Weaviate and OpenAI☆13Oct 31, 2023Updated 2 years ago
- Demo of obsidiantools Python package (for Binder)☆13Jul 8, 2025Updated 9 months ago
- Supercharged pandas indexing☆11Mar 28, 2021Updated 5 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- A cross platform command-line tool to deduplicate files, fast☆49Nov 5, 2023Updated 2 years ago
- Custom AppleScript libraries providing a variety of utilities☆17Sep 11, 2023Updated 2 years ago
- Go implementation of the FastCDC content-defined chunking algorithm☆82Aug 14, 2023Updated 2 years ago
- A type decoder for objective c types☆14Oct 20, 2024Updated last year
- Datasource Components for KnockoutJs for paging, sorting and filtering remote sources.☆25Jul 25, 2013Updated 12 years ago
- Repo for the IDESSAI 2024 course on modeling audio with discrete tokens.☆13Sep 13, 2024Updated last year
- jQuery-based Json to html pretty printer☆26Jun 15, 2012Updated 13 years ago
- Algorithms and data structures☆19Oct 12, 2023Updated 2 years ago
- A Go(lang) IDS rule parser☆13Jun 10, 2019Updated 6 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Lets you reorganize videos in your watch later list to various playlists (for watch later hoarders like me!)☆10Feb 12, 2015Updated 11 years ago
- A collection of selected of models built with AllenNLP.☆25Feb 20, 2020Updated 6 years ago
- fluentd input plugin to extend tail to support multiple line log☆31Oct 13, 2014Updated 11 years ago
- An implementation of FastCDC in C☆35Jun 27, 2022Updated 3 years ago
- Cache any function call's to Deta base.☆12Oct 9, 2022Updated 3 years ago
- Collection of small scripts to generate update feeds☆12Mar 9, 2023Updated 3 years ago
- The most powerful and fastest YouTube searching Python library.☆13Oct 25, 2022Updated 3 years ago