benhoyt / soft404Links
Soft 404 (dead page) detector in Python
☆13Updated 7 years ago
Alternatives and similar repositories for soft404
Users that are interested in soft404 are comparing it to the libraries listed below
Sorting:
- Discord bot by Sanich for https://youtu.be/1lzPIhTaPDY☆13Updated 4 years ago
- COSC 404 - Database System Implementation☆31Updated last year
- The tech404.github.io website☆18Updated 11 months ago
- A timezone converter for online events☆19Updated 2 years ago
- A generic, machine learning-based revision scoring system for MediaWiki☆91Updated 2 years ago
- A series of creative 404 pages designed to pleasantly surprise your users if an error were to ever occur.☆77Updated 4 years ago
- Github mirror - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Developer_access for contributing)☆37Updated last year
- Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki☆28Updated last year
- Common Crawl fork of Apache Nutch☆40Updated last week
- Perpetual Access To The Scholarly Record☆120Updated last year
- Social Feed Manager user interface application.☆157Updated last year
- Distributed similarity search☆10Updated 5 years ago
- A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine☆198Updated 2 weeks ago
- 🔍 Mirror of https://gerrit.wikimedia.org/g/mediawiki/extensions/CirrusSearch. See https://www.mediawiki.org/wiki/Developer_access for co…☆45Updated this week
- Social Media Analysis for Situation Awareness during Crises (SMASAC) Tutorial☆25Updated 7 years ago
- Please note that the warc-indexer tool & code is now supported by NetArchiveSuite. The 'warc-indexer' directory and code that exists in t…☆132Updated 2 months ago
- Github mirror - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Developer_access for contributing)☆111Updated last year
- Source real estate prices from the Common Crawl.☆27Updated 7 years ago
- Tools to construct and process Common Crawl webgraphs☆105Updated 2 weeks ago
- 📚 A compilation of research relevant to Data Together's efforts tackling the general problem of data resilience & interactivity☆98Updated 7 years ago
- Bot for telegram for anonymous communication☆33Updated 3 years ago
- Pageviews Analysis tool for Wikimedia Foundation wikis☆153Updated this week
- Wikipedia Tools for Google Spreadsheets — Install:☆157Updated last year
- Wikipedia Data Analysis Toolkit☆26Updated 9 years ago
- An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed…☆156Updated 4 months ago
- Github mirror - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Developer_access for contributing)☆50Updated 7 months ago
- A simple package allowing to use WebGraph data in Python (via the Jython interpreter).☆20Updated 5 years ago
- MediaWiki extension to handle multilingual abstract content☆78Updated last year
- How to find out who's popular for a particular group of Twitter users such as the Hacker News community.☆39Updated 11 years ago
- 💭 Gobo: Your social media. Your rules.☆111Updated 3 years ago