mk-fg / image-deduplication-toolLinks
Tool to detect (and get rid of) similar images using perceptual hashing (pHash lib)
☆82Updated 8 years ago
Alternatives and similar repositories for image-deduplication-tool
Users that are interested in image-deduplication-tool are comparing it to the libraries listed below
Sorting:
- WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.☆46Updated 7 years ago
- Tool for managing data-deduplication within extant compressed archive files, along with a relatively performant BK tree implementation fo…☆102Updated last year
- Rewriting web proxy and archival tool. At this point, it just tries to download all the things.☆203Updated 2 weeks ago
- Grabbing all news.☆62Updated 5 years ago
- Suite of tools for detecting changes in web pages and their rendering☆54Updated last year
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆46Updated 7 years ago
- Implementation of perceptual image hash calculation in Python☆131Updated last year
- CLI utility to find near duplicate images and remove all but the best copy.☆162Updated this week
- gofeed is disigned to extract full-text rss feeds from websites which only provide partial feeds or none☆9Updated 10 years ago
- Esper instance for TV news analysis☆40Updated 2 years ago
- Short script for removing watermarks from PDF files. Requires pdftk.☆59Updated 6 years ago
- Encrypted backups (without the backups)☆120Updated 10 years ago
- Define simple search patterns in bulk to perform advanced matching on any string☆56Updated last year
- One-Click User Instigated Preservation☆127Updated 6 years ago
- Document Management System (scanner -> appengine blobs)☆146Updated 10 years ago
- A dockerized, queued high fidelity web archiver based on Squidwarc☆60Updated 11 months ago
- Evaluating the performance and accuracy of ABBYY FineReader's OCR on Senate Financial Disclosure scanned forms☆132Updated 9 years ago
- Extract scenecuts from video files using ffmpeg☆86Updated last week
- Detect source resolution of upscaled images☆250Updated last week
- Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.☆18Updated last year
- PDF Extraction Toolkit☆41Updated 4 years ago
- Google Chrome Extension. Record All Browsing in Screenshots & Full Text. Search For Anything At Any Time. Never Forget Where You Read Som…☆307Updated 7 years ago
- A scrapy spider to extract post, thread, and user information from a vBulletin forum to a MongoDB database.☆32Updated 9 years ago
- rsync algorithm in python☆40Updated last year
- Image histogram remapping☆214Updated 5 years ago
- Fast Near-Duplicate Image Search and Delete using pHash, t-SNE and KDTree.☆158Updated 2 years ago
- Python XMP Toolkit☆97Updated last year
- Sort your movies on filesystem by dates, ratings, etc using symlinks.