Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.
☆18Aug 28, 2023Updated 2 years ago
Alternatives and similar repositories for deduplication
Users that are interested in deduplication are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Folder Git☆14Nov 16, 2018Updated 7 years ago
- ☆39Jul 28, 2023Updated 2 years ago
- Deduplication for cfDNA sequencing data☆11Jul 5, 2017Updated 8 years ago
- A Python tool to search for and remove duplicated files in messy datasets☆15Dec 23, 2024Updated last year
- Compute statistics on git repositories☆10May 29, 2019Updated 7 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- String deduplication package for Go☆19Jan 10, 2024Updated 2 years ago
- A tool for managing files using tags instead of folders☆15Apr 17, 2021Updated 5 years ago
- 文档去重功能是为了解决搜索引擎的文档语义重复的问题,方法是多重哈希下的语义指纹算法。☆11Aug 17, 2013Updated 12 years ago
- A file system to transparently read RAR files by representing them as directories.☆11Dec 31, 2017Updated 8 years ago
- Create and access rar archives.☆16May 10, 2016Updated 10 years ago
- 🕹️ Group and deduplicate concurrent tasks☆31May 15, 2026Updated last month
- POSIX-compliant Linux shell utility designed to search files based on their extended attributes.☆14Sep 17, 2022Updated 3 years ago
- AVR-based monitoring of home electricity consumption☆15Jan 4, 2026Updated 5 months ago
- A music spectrum analyser and visualisation program for squeezelite☆14Jan 30, 2023Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Init and management script for mounting rewritable squashfs-compressed data☆45Jun 20, 2025Updated last year
- A Go library implementing a buzhash rolling hash function☆31Aug 16, 2016Updated 9 years ago
- Python package for deduplication/entity resolution using active learning☆82Aug 24, 2024Updated last year
- 📺 Self-hosted, open source YouTube subscription management system☆11Apr 2, 2024Updated 2 years ago
- A tool & Python 3 library to decompress anything☆12Jan 24, 2021Updated 5 years ago
- A Python FUSE file system that features transparent deduplication and compression which make it ideal for archiving backups.☆140Jul 22, 2010Updated 15 years ago
- A GoLang implementation of Fiche as provided by solusipse/fiche☆11Jan 5, 2018Updated 8 years ago
- Fast and efficient content-defined chunking for data deduplication. Java implementation of FastCDC as library.☆26Sep 21, 2023Updated 2 years ago
- Scripts to build openrisc toolchain and bootable filesystem☆12Sep 15, 2014Updated 11 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- TensorRT☆11Sep 22, 2020Updated 5 years ago
- Docker configuration for davos☆14Nov 21, 2025Updated 7 months ago
- super-Django-CC is a simle web interface for commoncrawl.org☆15Dec 8, 2022Updated 3 years ago
- Exploit for uTorrent vulnerability CVE-2020-8437 by mavlevin☆11Feb 1, 2026Updated 5 months ago
- Basically my ~/bin folder.☆50Apr 19, 2026Updated 2 months ago
- Rust bindings to the Knot Resolver library (also known as libkres)☆18Apr 2, 2019Updated 7 years ago
- a brief example of how to make django up and running with kubernetes☆30Mar 11, 2023Updated 3 years ago
- RAG-Fusion implementation using Langchain, Weaviate and OpenAI☆13Oct 31, 2023Updated 2 years ago
- Demo of obsidiantools Python package (for Binder)☆13Jul 8, 2025Updated 11 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Supercharged pandas indexing☆11Mar 28, 2021Updated 5 years ago
- A cross platform command-line tool to deduplicate files, fast☆51Nov 5, 2023Updated 2 years ago
- Log4j_dos_CVE-2021-45105☆13Dec 19, 2021Updated 4 years ago
- Ongoing research training Mixture of Expert models.☆22Sep 16, 2024Updated last year
- Custom AppleScript libraries providing a variety of utilities☆18Sep 11, 2023Updated 2 years ago
- Fast, lightweight MaxMind GeoIP lookup server written in Rust☆16Jun 24, 2026Updated last week
- GFS: a Graph-based File System Enhanced with Semantic Features☆30May 27, 2021Updated 5 years ago