fake-name / IntraArchiveDeduplicator
Tool for managing data-deduplication within extant compressed archive files, along with a relatively performant BK tree implementation for fuzzy image searching.
☆96Updated last year
Related projects: ⓘ
- Fast hamming-distance range searches via native GiST Indexing facility in PostgreSQL☆164Updated 4 years ago
- WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.☆43Updated 6 years ago
- Detect source resolution of upscaled images☆235Updated 5 months ago
- Serving content from a WARC☆60Updated 11 years ago
- Github-Repository of the pHash.org library for perceptual hashing.☆221Updated 6 years ago
- ☆86Updated 8 months ago
- bktree data structure with a Python interface for a CPP implementation☆13Updated 7 years ago
- Rewriting web proxy and archival tool. At this point, it just tries to download all the things.☆198Updated this week
- Remote client for distributed automated HTTP(s) content fetching.☆77Updated this week
- Check out https://github.com/webrecorder/webrecorder for newer version matching https://webrecorder.io☆39Updated 8 years ago
- Implementation of perceptual image hash calculation in Python☆130Updated 10 months ago
- isk-daemon is an open source standalone server and library capable of adding content-based (visual) image searching to any image related …☆137Updated 9 years ago
- A reverse image search algorithm which performs 2D affine transformation-invariant partial image-matching in sublinear time☆288Updated 5 years ago
- Saves proxied HTTP traffic to a WARC file.☆26Updated 10 years ago
- Tool to detect (and get rid of) similar images using perceptual hashing (pHash lib)☆80Updated 7 years ago
- Making a reusable toolkit for writing seesaw scripts☆69Updated last year
- An HTTP-based warc-to-zip converter☆11Updated 11 years ago
- python module for indexing tar files for fast access☆72Updated 9 years ago
- Insanely fast JPEG/ JPG thumbnail scaling with the minimum fuss and CPU overhead. It makes use of libjpeg features of being able to load …☆261Updated last year
- A Python binding for libpuzzle.☆45Updated 4 years ago
- Perceptual hashing tools for detecting child sexual abuse material☆174Updated this week
- ☆67Updated 6 years ago
- A queue-controlled browser automation tool for improving web crawl quality☆60Updated 4 years ago
- Convert HTTP Archive (HAR) -> Web Archive (WARC) format☆44Updated 5 years ago
- Perceptual Hash project for Videos (MMAI Term Project)☆27Updated 10 years ago
- Web archiving using Google Chrome☆42Updated 4 years ago
- Serapis is a sentence identifier and modeling pipeline / built for Wordnik☆24Updated 8 years ago
- Unix de-duplicating archiver☆121Updated 9 years ago
- AcoustID's web site and API☆64Updated last month
- JPEG Optimization☆55Updated 10 years ago