BitCurator / bitcurator-redact-pdfLinks
A PDF redaction tool that employs named entity recognition.
☆17Updated 2 years ago
Alternatives and similar repositories for bitcurator-redact-pdf
Users that are interested in bitcurator-redact-pdf are comparing it to the libraries listed below
Sorting:
- A search interface and wayback machine for the UKWA Solr based warc-indexer framework.☆131Updated this week
- A tool to detect whether a PDF has a bad redaction☆152Updated 3 weeks ago
- Convert HTTP Archive (HAR) -> Web Archive (WARC) format☆54Updated 7 years ago
- Web archive index server based on RocksDB☆36Updated last month
- Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.☆129Updated 2 months ago
- URLTeam's second generation of URL shortener archiving tools☆79Updated last month
- Web archiving using Google Chrome☆46Updated 5 years ago
- CDXJ Indexing of WARC/ARCs☆29Updated 10 months ago
- Classic LOCKSS System (LOCKSS 1.x)☆67Updated this week
- Command line tool for digging into WARC files☆46Updated this week
- Estimating the age of web resources☆96Updated 4 months ago
- A listing of world wide web archives, for humans and machines using Web Archive Manifest (WAM) yaml format☆53Updated 2 years ago
- 🦉 Agent for Flock, the privacy-preserving fleet management system☆32Updated 5 years ago
- ☆27Updated 3 years ago
- A list of tools related to W(eb)ARC(hive)☆64Updated 10 years ago
- Qubes-based SecureDrop Journalist Workstation environment for submission handling☆158Updated last week
- DEPRECATED. Desktop graph visualization application☆51Updated 3 years ago
- dangerzone has moved to https://github.com/freedomofpress/dangerzone☆40Updated 4 years ago
- 📚 A compilation of research relevant to Data Together's efforts tackling the general problem of data resilience & interactivity☆97Updated 7 years ago
- Trough: Big data, small databases.☆40Updated last year
- Browsertrix: Containerized High-Fidelity Browser-Based Automated Crawling + Behavior System☆87Updated 4 years ago
- A Memento Aggregator CLI and Server in Go☆69Updated 7 months ago
- Python library and supporting utilities to parse and process PST and mbox email sources☆118Updated last month
- A list of things related to software, literature, and other content for 🕣 Memento☆102Updated last year
- Now included in rigour☆152Updated last month
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆25Updated 8 years ago
- Centralised repository for WARC usage specifications.☆117Updated 2 weeks ago
- BitCurator Environment: Using, building, and maintaining BitCurator☆61Updated last year
- The Toolkit API, app, and browser extension. Start preserving now.☆47Updated this week
- Recover lost websites from the Web Infrastructure☆89Updated 2 months ago