benhoyt / soft404
Soft 404 (dead page) detector in Python
☆13Updated 6 years ago
Alternatives and similar repositories for soft404:
Users that are interested in soft404 are comparing it to the libraries listed below
- Shepherding our web archives from crawl to access.☆10Updated last year
- Discord bot by Sanich for https://youtu.be/1lzPIhTaPDY☆13Updated 3 years ago
- A collection of tools to (semi-)automatically collect and analyze data from online discussions on Facebook groups and pages.☆34Updated 9 years ago
- ☆26Updated last week
- Python script that can remove watermark from TikTok videos☆15Updated 4 years ago
- A Memento Aggregator CLI and Server in Go☆62Updated last month
- This a module to extract RDF from an HTML5 page annotated with microdata. The module implements the algorithm defined and published by th…☆44Updated 2 years ago
- An Telegram Bot By @ZauteKm To Stream Videos In Telegram Voice Chat Of Both Groups & Channels. Supports Live Streams, YouTube Videos & Te…☆22Updated 2 years ago
- Tooling that automates your Facebook interactions.☆62Updated last year
- Python bindings to the Compact Language Detector☆33Updated 4 years ago
- ☆8Updated 5 years ago
- A simple package allowing to use WebGraph data in Python (via the Jython interpreter).☆19Updated 4 years ago
- Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki☆26Updated 8 months ago
- track changes to the news, where news is anything with an RSS feed☆178Updated 4 years ago
- WARC and ARC indexing and discovery tools.☆122Updated last month
- Adding links to full text in Wikipedia references☆37Updated last year
- A collection of code, data and information related to our audit of TikTok.☆21Updated 2 months ago
- A simple Python library for searching on DuckDuckGo.☆36Updated last year
- Experimental continouous web crawler for web archiving☆9Updated 2 years ago
- Process, enhance and evaluate multiple OCR output.☆22Updated 5 months ago
- Command line OAI-PMH harvester and client with built-in cache.☆123Updated last week
- Perpetual Access To The Scholarly Record☆119Updated 8 months ago
- A Python tool-chain to enable transfer of backed-up files between various cloud stores including Dropbox, AWS S3, Glacier, Google Cloud P…☆9Updated 3 years ago
- Core Assignment Calculator / Research Project Calculator☆10Updated 7 years ago
- import a subset or a full Wikidata dump into a CouchDB database☆21Updated 7 months ago
- Scripts and microservice to feed an ElasticSearch with Wikidata and Inventaire entities, and keep those up-to-date☆41Updated 4 years ago
- Simple python script to download video and music from TikTok☆10Updated 4 years ago
- API implementation, User Interface, and more modules of the IPTC EXTRA project☆12Updated 3 years ago
- A scraper to download TikTok videos☆22Updated 5 years ago
- Command-line tool to extract a ranked list of relevant keywords from a corpus with the option of using either topic modeling or tf-idf sc…☆40Updated 8 years ago