mk-fg / image-deduplication-toolLinks
Tool to detect (and get rid of) similar images using perceptual hashing (pHash lib)
☆83Updated 8 years ago
Alternatives and similar repositories for image-deduplication-tool
Users that are interested in image-deduplication-tool are comparing it to the libraries listed below
Sorting:
- Tool for managing data-deduplication within extant compressed archive files, along with a relatively performant BK tree implementation fo…☆104Updated 2 years ago
- WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.☆47Updated 7 years ago
- Rewriting web proxy and archival tool. At this point, it just tries to download all the things.☆204Updated this week
- unified cli for various saas image classification apis.☆40Updated 8 years ago
- A Python Perceptual Image Hashing Module☆214Updated 3 years ago
- Google Chrome Extension. Record All Browsing in Screenshots & Full Text. Search For Anything At Any Time. Never Forget Where You Read Som…☆308Updated 7 years ago
- Grabbing all news.☆62Updated 5 years ago
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆46Updated 7 years ago
- A dockerized, queued high fidelity web archiver based on Squidwarc☆61Updated last year
- A queue-controlled browser automation tool for improving web crawl quality☆62Updated last month
- Tool for downloading sets and photos from Flickr☆245Updated last week
- Identifies similar pictures on your local computer☆78Updated 5 years ago
- Esper instance for TV news analysis☆40Updated 2 years ago
- Easily archive important Reddit post threads onto your computer☆59Updated 3 years ago
- Check out https://github.com/webrecorder/webrecorder for newer version matching https://webrecorder.io☆38Updated 9 years ago
- Mechanical Turk on your own machine.☆207Updated 10 months ago
- Implementation of perceptual image hash calculation in Python☆132Updated last year
- 📂🛡️Suite of tools for file fixity (data protection for long term storage⌛) using redundant error correcting codes, hash auditing and du…☆146Updated 2 weeks ago
- Bookmark and archive webpages from the command line☆33Updated 6 years ago
- Save a bunch of web pages as a self-contained, compressed archive file for offline storage and sharing.☆35Updated 12 years ago
- Suite of tools for detecting changes in web pages and their rendering☆55Updated last year
- Tag-based bookmark manager inspired by delicious and Pinboard☆34Updated 2 years ago
- Short script for removing watermarks from PDF files. Requires pdftk.☆59Updated 6 years ago
- Store and restore metadata from a filesystem.☆175Updated 2 years ago
- ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (image…☆96Updated 7 years ago
- Automatic video summaries☆265Updated 7 years ago
- Rename images using deep learning☆153Updated 2 years ago
- Wget-compatible web downloader and crawler.☆593Updated last year
- One-Click User Instigated Preservation☆128Updated 6 years ago
- Web archiving using Google Chrome☆47Updated 5 years ago