BitCurator / bitcurator-redact-pdfLinks
A PDF redaction tool that employs named entity recognition.
☆17Updated 2 years ago
Alternatives and similar repositories for bitcurator-redact-pdf
Users that are interested in bitcurator-redact-pdf are comparing it to the libraries listed below
Sorting:
- Python library and supporting utilities to parse and process PST and mbox email sources☆118Updated last week
- A search interface and wayback machine for the UKWA Solr based warc-indexer framework.☆131Updated last week
- Convert HTTP Archive (HAR) -> Web Archive (WARC) format☆54Updated 7 years ago
- Web archive index server based on RocksDB☆36Updated 3 weeks ago
- Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.☆130Updated 2 months ago
- ☆27Updated 3 years ago
- Trough: Big data, small databases.☆40Updated last year
- Web archiving using Google Chrome☆46Updated 5 years ago
- A listing of world wide web archives, for humans and machines using Web Archive Manifest (WAM) yaml format☆53Updated 2 years ago
- ☆54Updated last year
- A simple Python wrapper and command-line interface for archive.org’s "Save Page Now" capturing service☆188Updated last year
- A Memento Aggregator CLI and Server in Go☆70Updated 8 months ago
- Converts WARC files to static HTML☆49Updated 2 months ago
- A list of things related to software, literature, and other content for 🕣 Memento☆102Updated last year
- Free and open-source digital preservation system designed to maintain standards-based, long-term access to collections of digital objects…☆471Updated this week
- Classic LOCKSS System (LOCKSS 1.x)☆67Updated last week
- Web Archiving Integration Layer: One-Click User Instigated Preservation☆381Updated 8 months ago
- Specifications developed and maintained by the Webrecorder community.☆136Updated last month
- Qubes component: app-linux-pdf-converter☆59Updated last month
- BitCurator Environment: Using, building, and maintaining BitCurator☆60Updated last year
- dangerzone has moved to https://github.com/freedomofpress/dangerzone☆41Updated 4 years ago
- Recover lost websites from the Web Infrastructure☆89Updated 3 months ago
- 📚 A compilation of research relevant to Data Together's efforts tackling the general problem of data resilience & interactivity☆98Updated 7 years ago
- 🍨 High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.☆179Updated 2 months ago
- Social Feed Manager user interface application.☆156Updated last year
- Archivematica storage service☆39Updated this week
- A list of tools related to W(eb)ARC(hive)☆64Updated 11 years ago
- Support for writing WARC files with Scrapy☆23Updated 5 years ago
- Command line tool to convert a file in the WARC format to a file in the ZIM format☆73Updated 8 months ago
- Analyze PDFs with colors (and YARA)☆339Updated 2 weeks ago