benhoyt / soft404
Soft 404 (dead page) detector in Python
☆13Updated 6 years ago
Alternatives and similar repositories for soft404:
Users that are interested in soft404 are comparing it to the libraries listed below
- A classifier for detecting soft 404 pages☆57Updated last year
- Shepherding our web archives from crawl to access.☆10Updated last year
- A generic, machine learning-based revision scoring system for MediaWiki☆89Updated last year
- Github mirror - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Developer_access for contributing)☆34Updated 8 months ago
- Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki☆25Updated 7 months ago
- Sort-friendly URI Reordering Transform (SURT) python module☆41Updated 7 months ago
- An intelligent reading agent that understands text and translates it into Wikidata statements.☆113Updated 8 years ago
- produce a stream of citiation data coming off wikimedia☆12Updated 7 years ago
- Experimental continouous web crawler for web archiving☆9Updated 2 years ago
- Various examples of notebooks for working with web archives with the Archives Unleashed Toolkit, and derivatives generated by the Archive…☆24Updated 2 years ago
- WARC and ARC indexing and discovery tools.☆122Updated 6 months ago
- Adding links to full text in Wikipedia references☆37Updated last year
- Generates large collages of images using OpenSeadragon☆48Updated 10 months ago
- Import entities from another Wikibase instance (e.g. Wikidata)☆28Updated 4 years ago
- Python library for reading and writing warc files☆239Updated 2 years ago
- Sickle: OAI-PMH for Humans☆108Updated last year
- Web application for distributed compute analysis of Archive-It web archive collections.☆15Updated 6 months ago
- Python API for KB data-services☆19Updated 5 years ago
- ☆40Updated 7 years ago
- Wikidata embedding☆50Updated 3 months ago
- Plots various graphs for a series of plaintext files in a directory☆19Updated 8 years ago
- Python bindings for the fast integer compression library FastPFor.☆58Updated last year
- Discord bot by Sanich for https://youtu.be/1lzPIhTaPDY☆13Updated 3 years ago
- Docker image for the Archives Unleashed Toolkit☆12Updated 2 years ago
- Simple pythonic JSON to JSON converter.☆10Updated last year
- World Wide Web site! For the Scholars' Lab!☆12Updated this week
- Hidden alignment conditional random field for classifying string pairs.☆24Updated 5 months ago
- A Vue component for crowdsourcing Web Annotations.☆22Updated last year
- search interface for scholarly works☆84Updated 7 months ago