Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.
☆18Aug 28, 2023Updated 2 years ago
Alternatives and similar repositories for deduplication
Users that are interested in deduplication are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Get a list of deduped files on a ZFS filesystem☆13Oct 14, 2020Updated 5 years ago
- Folder Git☆14Nov 16, 2018Updated 7 years ago
- ☆39Jul 28, 2023Updated 2 years ago
- RapidCDC: Leveraging Duplicate Locality to Accelerate Chunking in CDC-based Deduplication Systems☆17May 25, 2020Updated 5 years ago
- Deduplication for cfDNA sequencing data☆11Jul 5, 2017Updated 8 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Tool to detect (and get rid of) similar images using perceptual hashing (pHash lib)☆83Nov 6, 2016Updated 9 years ago
- Fast duplicate file detection library☆26Jan 5, 2017Updated 9 years ago
- Image Deduplication in Python☆23May 16, 2020Updated 5 years ago
- String deduplication package for Go☆19Jan 10, 2024Updated 2 years ago
- A tool for managing files using tags instead of folders☆14Apr 17, 2021Updated 5 years ago
- 文档去重功能是为了解决搜索引擎的文档语义重复的问题,方法是多重哈希下的语义指纹算法。☆11Aug 17, 2013Updated 12 years ago
- Find duplicate text files.☆14Jan 14, 2025Updated last year
- Pile Deduplication Code☆18May 15, 2023Updated 2 years ago
- 🕹️ Group and deduplicate concurrent tasks☆30Apr 1, 2026Updated last month
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Python library and dashboard for hyperparameter search and model training for computer vision tasks based on PyTorch, Optuna, FiftyOne, D…☆17Jul 14, 2023Updated 2 years ago
- Content Defined Chunking playground☆50Mar 26, 2026Updated last month
- A Go library implementing a buzhash rolling hash function☆31Aug 16, 2016Updated 9 years ago
- Use clonefile to deduplicate files on APFS.☆57Apr 8, 2026Updated 3 weeks ago
- Python package for deduplication/entity resolution using active learning☆82Aug 24, 2024Updated last year
- Utility to list duplicate files in one or more directories based on the file contents☆24Sep 23, 2024Updated last year
- Fast and efficient content-defined chunking for data deduplication. Java implementation of FastCDC as library.☆26Sep 21, 2023Updated 2 years ago
- TensorRT☆11Sep 22, 2020Updated 5 years ago
- RAG-Fusion implementation using Langchain, Weaviate and OpenAI☆13Oct 31, 2023Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Check duplicated files☆25Oct 9, 2018Updated 7 years ago
- A cross platform command-line tool to deduplicate files, fast☆50Nov 5, 2023Updated 2 years ago
- An in-memory point-in-polygon (reverse geocoding) package for Who's On First data☆10Sep 28, 2017Updated 8 years ago
- Custom AppleScript libraries providing a variety of utilities☆17Sep 11, 2023Updated 2 years ago
- A content inspecting SMTP proxy☆17Jun 9, 2014Updated 11 years ago
- A type decoder for objective c types☆14Oct 20, 2024Updated last year
- ☆14Apr 13, 2026Updated 2 weeks ago
- jQuery-based Json to html pretty printer☆26Jun 15, 2012Updated 13 years ago
- Algorithms and data structures☆19Oct 12, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Sample solution to build a deployment pipeline for Amazon SageMaker.☆13Jul 18, 2022Updated 3 years ago
- fluentd input plugin to extend tail to support multiple line log☆31Oct 13, 2014Updated 11 years ago
- An implementation of FastCDC in C☆35Jun 27, 2022Updated 3 years ago
- Collection of small scripts to generate update feeds☆12Mar 9, 2023Updated 3 years ago
- ☆11Apr 24, 2023Updated 3 years ago
- This is a collection of my dotfiles not related to either vim or emacs. Mostly just bashy stuff.☆53Aug 24, 2014Updated 11 years ago
- The most powerful and fastest YouTube searching Python library.☆13Oct 25, 2022Updated 3 years ago