Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.
☆19Aug 28, 2023Updated 2 years ago
Alternatives and similar repositories for deduplication
Users that are interested in deduplication are comparing it to the libraries listed below
Sorting:
- Folder Git☆14Nov 16, 2018Updated 7 years ago
- Get a list of deduped files on a ZFS filesystem☆13Oct 14, 2020Updated 5 years ago
- RapidCDC: Leveraging Duplicate Locality to Accelerate Chunking in CDC-based Deduplication Systems☆17May 25, 2020Updated 5 years ago
- Deduplicating filesystem via Python3, FUSE and SQLite☆28Feb 17, 2026Updated last week
- Fast duplicate file detection library☆26Jan 5, 2017Updated 9 years ago
- Converts HTTrack crawls to WARC files☆34Aug 6, 2024Updated last year
- Init and management script for mounting rewritable squashfs-compressed data☆45Jun 20, 2025Updated 8 months ago
- 新词发现/新词挖掘/自由度/凝固度/python3☆10May 28, 2019Updated 6 years ago
- Disable Target API Block☆26Oct 18, 2025Updated 4 months ago
- Finetuning & extending DiffusionDet to video & pedestrian multi-object-tracking☆13Apr 12, 2023Updated 2 years ago
- Style Transfer by Rigid Alignment in Neural Net Feature Space☆11Jan 23, 2021Updated 5 years ago
- 【Android 11-13】为移动热点设置静态 IP☆10Mar 5, 2024Updated last year
- Architecture of Twint scrapper which allow download tweets on many instances without api restrictions☆10Nov 30, 2020Updated 5 years ago
- Generates a YouTube playlist from a list of URLs.☆10Aug 14, 2023Updated 2 years ago
- A file system to transparently read RAR files by representing them as directories.☆11Dec 31, 2017Updated 8 years ago
- Dedup and compress your device mapper devices. Works especially well with thin provisioning.☆10Dec 4, 2025Updated 2 months ago
- 豆瓣电影评论可视化☆10May 19, 2016Updated 9 years ago
- ArchiveWeb.page Express!☆14Nov 1, 2024Updated last year
- Next generation linbo☆12Jan 31, 2026Updated last month
- C++ rewrite of PPPwn (PlayStation 4 PPPoE RCE)☆10Feb 27, 2025Updated last year
- Twitter based sentiment analysis using JAVA and Hadoop. In this project we are doing the sentiment analysis on twitter data to analyse wh…☆10Apr 22, 2018Updated 7 years ago
- A Python package for accessing the OpenCorporates API☆11Feb 12, 2019Updated 7 years ago
- I have created a dataset of Image-Text-Pairs by using the cosine similarity of the CLIP embeddings of the image & it's caption derrived f…☆16Apr 22, 2021Updated 4 years ago
- The program can be used to scrape the content from an article from web by an input of a set of URLs in a text file or a URL. This project…☆17Aug 5, 2020Updated 5 years ago
- golang package to provide lightweight internal pub/sub for goroutines☆29Jan 23, 2014Updated 12 years ago
- Reborn cscope extension for emacs☆10Apr 21, 2024Updated last year
- Detects scene change or cuts in a video file☆11Oct 23, 2017Updated 8 years ago
- ☆13Jan 5, 2023Updated 3 years ago
- Repository for devcontainers of CPython☆13Feb 20, 2026Updated last week
- Web app which displays the daily and hourly sentiments for a stock (user to enter ticker as input). Stock sentiments are determined from…☆10Sep 26, 2022Updated 3 years ago
- This project implenments the OSPF using Dijkstra algorithm (Open Shortest Path First) network protocol in python. Link-State Routing pr…☆12Sep 1, 2017Updated 8 years ago
- Gootool for Android☆13Jul 21, 2023Updated 2 years ago
- VOrg 是一个简单 VS Code 扩展,为 Org-mode 文档提供完整的编辑和预览体验,借助 VS Code 提升 org-mode 的编辑体验☆17Feb 2, 2026Updated last month
- Examples for using the Pipl SEARCH API☆11Dec 19, 2023Updated 2 years ago
- ☆12Aug 30, 2022Updated 3 years ago
- A S3 hybrid storage interface for dat and hyperdrive☆13Jul 31, 2018Updated 7 years ago
- Visual hashes☆25Mar 21, 2017Updated 8 years ago
- A transient UI for Cargo, Rust's package manager☆11Dec 17, 2025Updated 2 months ago
- Pytorch implementation of various token mixers; Attention Mechanisms, MLP, and etc for understanding computer vision papers and other tas…☆16Oct 7, 2024Updated last year