BitCurator / bitcurator-redact-pdfLinks
A PDF redaction tool that employs named entity recognition.
β15Updated 2 years ago
Alternatives and similar repositories for bitcurator-redact-pdf
Users that are interested in bitcurator-redact-pdf are comparing it to the libraries listed below
Sorting:
- π A compilation of research relevant to Data Together's efforts tackling the general problem of data resilience & interactivityβ95Updated 6 years ago
- A list of things related to software, literature, and other content for π£ Mementoβ99Updated last year
- Web archiving using Google Chromeβ44Updated 5 years ago
- A Memento Aggregator CLI and Server in Goβ65Updated 3 months ago
- Save data from Google Takeout to a SQLite databaseβ109Updated last year
- Chrome extension that uses Memento to indicate that a page a user is viewing on the live web has an archived copy and to give the user acβ¦β54Updated this week
- URLTeam's second generation of URL shortener archiving toolsβ76Updated last month
- Convert HTTP Archive (HAR) -> Web Archive (WARC) formatβ51Updated 6 years ago
- A listing of world wide web archives, for humans and machines using Web Archive Manifest (WAM) yaml formatβ53Updated 2 years ago
- Browsertrix: Containerized High-Fidelity Browser-Based Automated Crawling + Behavior Systemβ87Updated 4 years ago
- The Toolkit API, app, and browser extension. Start preserving now.β47Updated 3 weeks ago
- Recover lost websites from the Web Infrastructureβ89Updated 4 years ago
- Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.β124Updated 5 months ago
- A simple Python wrapper and command-line interface for archive.orgβs "Save Page Now" capturing serviceβ180Updated 8 months ago
- A diagram of my personal infrastructureβ49Updated 4 years ago
- Discusses how to verify DKIM signatures in old emails, namely one of the Hunter Biden emails in the newsβ101Updated 2 years ago
- A list of tools related to W(eb)ARC(hive)β62Updated 10 years ago
- Web archive index server based on RocksDBβ34Updated last month
- Command line tool for digging into WARC filesβ40Updated 2 weeks ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.β42Updated last week
- Serving content from a WARCβ61Updated 12 years ago
- Qubes component: app-linux-pdf-converterβ56Updated 3 weeks ago
- Centralised repository for WARC usage specifications.β115Updated 7 months ago
- Classic LOCKSS System (LOCKSS 1.x)β66Updated this week
- Anti-stylometry tool based on PSAL's Anonymouthβ11Updated 5 years ago
- Bot for operating snscrape in #archivebot on efnetβ10Updated 5 years ago
- Data cleaning and validation functions for names, languages, identifiers, etc.β27Updated this week
- Qubes-based SecureDrop Journalist Workstation environment for submission handlingβ150Updated this week
- A dockerized, queued high fidelity web archiver based on Squidwarcβ60Updated 11 months ago
- track changes to the news, where news is anything with an RSS feedβ178Updated 5 years ago