mikekestemont / ruzicka
☆12Updated last year
Related projects ⓘ
Alternatives and complementary repositories for ruzicka
- A PDF classifier ensemble with REST API service☆23Updated 3 years ago
- Simplified version of a common crawl fetcher☆12Updated this week
- Docker Compose based system for running remote browsers (including Flash and Java support) connected to web archives☆13Updated 3 years ago
- Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki☆25Updated 3 months ago
- Legal document classification with EuroVoc descriptors on 22 languages.☆25Updated last year
- Convert HTTP Archive (HAR) -> Web Archive (WARC) format☆45Updated 6 years ago
- CLI implementation of httpreserve that can test links and retrieve internet archive replacements☆9Updated last year
- hexdump(1) for Unicode data☆38Updated 2 months ago
- A set of utilities for processing MediaWiki XML dump data.☆45Updated 3 months ago
- Artifacts from the DARPA-funded SafeDocs research program☆22Updated last year
- Chrome extension that disables WebBluetooth☆14Updated 6 years ago
- Sort-friendly URI Reordering Transform (SURT) python module☆40Updated 3 months ago
- In-browser OCR of Ancient Greek and Latin☆23Updated last week
- R package for stylometric analyses☆173Updated 2 months ago
- An attempt to document commonly believed misconceptions about Tor.☆14Updated 7 years ago
- Z39.50/SRU router☆15Updated last month
- A Memento Aggregator CLI and Server in Go☆57Updated 5 months ago
- An index of PDF-centric corpora☆107Updated last month
- A software to detect text reuse with BLAST.☆14Updated 5 years ago
- Python Multilingual Ucrel Semantic Analysis System☆30Updated 2 months ago
- Quick and dirty script to parse bplists with Ruby☆11Updated 4 years ago
- Mass DNS resolution tool☆36Updated 3 years ago
- DHLAB is a library of python modules for accessing text and pictures at the National Library of Norway.☆20Updated last week
- Notes and reference for ongoing forecasting.☆16Updated 2 years ago
- Support for writing WARC files with Scrapy☆20Updated 4 years ago
- search interface for scholarly works☆80Updated 3 months ago
- Various modules to implement the DetecTor design from http://detector.kuix.de☆52Updated 8 years ago
- An adversarial stylometry application that seeks to help users evade authorship attribution.☆19Updated 9 years ago
- Code release for: Cookies that give you away: The surveillance implications of web tracking☆53Updated 5 years ago
- Some code to examine and modify your experience of Twitter.☆11Updated 4 years ago