fake-name / IntraArchiveDeduplicatorLinks
Tool for managing data-deduplication within extant compressed archive files, along with a relatively performant BK tree implementation for fuzzy image searching.
☆102Updated last year
Alternatives and similar repositories for IntraArchiveDeduplicator
Users that are interested in IntraArchiveDeduplicator are comparing it to the libraries listed below
Sorting:
- ☆89Updated last year
- Check out https://github.com/webrecorder/webrecorder for newer version matching https://webrecorder.io☆38Updated 9 years ago
- Serving content from a WARC☆62Updated 12 years ago
- WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.☆47Updated 7 years ago
- Implementation of perceptual image hash calculation in Python☆132Updated last year
- Github-Repository of the pHash.org library for perceptual hashing.☆223Updated 7 years ago
- An HTTP-based warc-to-zip converter☆12Updated 12 years ago
- A reverse image search algorithm which performs 2D affine transformation-invariant partial image-matching in sublinear time☆290Updated 5 years ago
- Rewriting web proxy and archival tool. At this point, it just tries to download all the things.☆203Updated last month
- bktree data structure with a Python interface for a CPP implementation☆13Updated 8 years ago
- Detect source resolution of upscaled images☆254Updated 3 weeks ago
- Hamming distance between hex strings in SQLite☆25Updated 7 years ago
- Saves proxied HTTP traffic to a WARC file.☆29Updated 11 years ago
- C++ implementation of hamming distance algorithm HmSearch using Kyoto Cabinet☆42Updated 9 years ago
- A Python Perceptual Image Hashing Module☆213Updated 2 years ago
- Naïve Bayesian Text Classifier on Redis☆116Updated 6 years ago
- python module for indexing tar files for fast access☆77Updated 9 years ago
- IMAP server based on Twitter statuses☆55Updated 15 years ago
- Perceptual hashing tools for detecting child sexual abuse material☆184Updated 3 weeks ago
- A Python FUSE file system that features transparent deduplication and compression which make it ideal for archiving backups.☆139Updated 14 years ago
- Open source software for image correlation, distance and analysis☆61Updated 2 years ago
- Remote client for distributed automated HTTP(s) content fetching.☆78Updated last month
- A dockerized, queued high fidelity web archiver based on Squidwarc☆61Updated last year
- Encrypted backups (without the backups)☆120Updated 10 years ago
- WARC writing MITM HTTP/S proxy☆415Updated this week
- Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your com…☆130Updated 4 months ago
- Trough: Big data, small databases.☆42Updated 11 months ago
- A simple Python wrapper for the archive.is capturing service☆203Updated 5 months ago
- search, dedupe, and media ingestion for mediachain☆33Updated 8 years ago
- Content-Disposition header support for Python☆40Updated last year