Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.
☆18Aug 28, 2023Updated 2 years ago
Alternatives and similar repositories for deduplication
Users that are interested in deduplication are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Find near-duplicate documents using minhashing implemented in Go.☆16Dec 22, 2015Updated 10 years ago
- Get a list of deduped files on a ZFS filesystem☆13Oct 14, 2020Updated 5 years ago
- Folder Git☆14Nov 16, 2018Updated 7 years ago
- ☆39Jul 28, 2023Updated 2 years ago
- A merged read deduplication tool capable to perform merged read deduplication on single end data.☆13Sep 4, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- RapidCDC: Leveraging Duplicate Locality to Accelerate Chunking in CDC-based Deduplication Systems☆17May 25, 2020Updated 5 years ago
- Create snapshot commits on a not checked-out branch without touching the working tree or losing staged changes☆17Mar 16, 2026Updated 2 months ago
- Fast duplicate file detection library☆26Jan 5, 2017Updated 9 years ago
- Tool to detect (and get rid of) similar images using perceptual hashing (pHash lib)☆83Nov 6, 2016Updated 9 years ago
- Price options by fitting a Lévy distribution☆10Jan 20, 2021Updated 5 years ago
- Deduplicating filesystem via Python3, FUSE and SQLite☆30Feb 17, 2026Updated 3 months ago
- Visual hashes☆26Mar 21, 2017Updated 9 years ago
- A simple API that can generate various types of hexagon grids - returns GeoJSON data or load into PostGIS with performant JDBC.☆10May 6, 2026Updated 2 weeks ago
- A React/MUI component to visualize and explore RDF entities☆11Oct 15, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Wrapper for mirscreencast and ffmpeg to record Unity 8 desktop videos.☆13Oct 30, 2016Updated 9 years ago
- POSIX-compliant Linux shell utility designed to search files based on their extended attributes.☆14Sep 17, 2022Updated 3 years ago
- A FUSE filesystem that stores data on Git☆31Nov 2, 2020Updated 5 years ago
- Python library and dashboard for hyperparameter search and model training for computer vision tasks based on PyTorch, Optuna, FiftyOne, D…☆17Jul 14, 2023Updated 2 years ago
- Toon Shading demo for CSC 562☆15May 5, 2017Updated 9 years ago
- Init and management script for mounting rewritable squashfs-compressed data☆45Jun 20, 2025Updated 11 months ago
- A Go library implementing a buzhash rolling hash function☆31Aug 16, 2016Updated 9 years ago
- ✨ Epris is a JavaScript library that simplifies interface development☆26May 30, 2022Updated 3 years ago
- Simulation of RRT, RRT*, RRT*-FN and RRT*-FND algorithms.☆14May 15, 2023Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Python library for a duplicate lines removal written in C++☆32Aug 11, 2025Updated 9 months ago
- .foos for foos & more☆22Jun 14, 2023Updated 2 years ago
- Converts HTTrack crawls to WARC files☆34Aug 6, 2024Updated last year
- Fast and efficient content-defined chunking for data deduplication. Java implementation of FastCDC as library.☆26Sep 21, 2023Updated 2 years ago
- Scripts to build openrisc toolchain and bootable filesystem☆12Sep 15, 2014Updated 11 years ago
- Lightweight, header-only, C++17 configuration library☆22Feb 18, 2022Updated 4 years ago
- FastCDC implementation in Python https://pypi.org/project/fastcdc/☆65Jun 27, 2024Updated last year
- ☆10Jun 22, 2020Updated 5 years ago
- Basically my ~/bin folder.☆50Apr 19, 2026Updated last month
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Output text together with randomly generated ASCII robots in colors inspired by synthwave/rainbows☆29May 2, 2026Updated 2 weeks ago
- Demo of obsidiantools Python package (for Binder)☆13Jul 8, 2025Updated 10 months ago
- A type decoder for objective c types☆14Oct 20, 2024Updated last year
- Algorithms and data structures☆19Oct 12, 2023Updated 2 years ago
- Lets you reorganize videos in your watch later list to various playlists (for watch later hoarders like me!)☆10Feb 12, 2015Updated 11 years ago
- Cache any function call's to Deta base.☆13Oct 9, 2022Updated 3 years ago
- Functional composable pipelines allowing clean separation of the business logic and its implementation☆11Sep 6, 2025Updated 8 months ago