benhoyt / soft404
Soft 404 (dead page) detector in Python
☆13Updated 6 years ago
Alternatives and similar repositories for soft404:
Users that are interested in soft404 are comparing it to the libraries listed below
- ██████╗ ███████╗██████╗ ██╔══██╗██╔════╝██╔══██╗ ██████╔╝█████╗ ██║ ██║ ██╔══██╗██╔══╝ ██║ ██║ ██║ ██║███████╗██████╔╝ ╚═╝ ╚═╝╚═══…☆9Updated 3 years ago
- ☆30Updated 11 months ago
- Discord bot by Sanich for https://youtu.be/1lzPIhTaPDY☆13Updated 3 years ago
- 404Games Wastelands V2 - Chernarus☆21Updated 11 years ago
- 404 Error Page - Astronaut☆21Updated 5 years ago
- CMPUT404-project-socialdistribution☆14Updated 2 years ago
- ☆26Updated last week
- Shepherding our web archives from crawl to access.☆10Updated last year
- The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.☆143Updated last year
- ☆14Updated last year
- CMPUT404-assignment-ajax☆9Updated 11 years ago
- Rig for deploying DocumentCloud viewers to S3.☆13Updated 3 years ago
- 🐦 Access Twitter data without an API key. [DEPRECATED]☆175Updated 6 years ago
- Collection of datasets for benchmarking filtered vector similarity retrieval☆42Updated last year
- Command-line tool to extract a ranked list of relevant keywords from a corpus with the option of using either topic modeling or tf-idf sc…☆40Updated 8 years ago
- Faster Learned Sparse Retrieval with Block-Max Pruning. ACM SIGIR 2024.☆20Updated this week
- A fuzzy matching & clustering library for python.☆26Updated last year
- The tech404.github.io website☆16Updated 2 months ago
- Source real estate prices from the Common Crawl.☆27Updated 6 years ago
- A simple package allowing to use WebGraph data in Python (via the Jython interpreter).☆19Updated 4 years ago
- Various examples of notebooks for working with web archives with the Archives Unleashed Toolkit, and derivatives generated by the Archive …☆26Updated 2 years ago
- Wikipedia Data Analysis Toolkit☆26Updated 8 years ago
- Generating Wikipedia article embeddings using Word2vec and reading sessions☆18Updated 8 years ago
- Prototype SOLR-powered web archive exploration UI.☆43Updated 4 years ago
- ☆24Updated 7 years ago
- ☆10Updated 9 years ago
- Experimental continouous web crawler for web archiving☆9Updated 2 years ago
- WARC and ARC indexing and discovery tools.☆123Updated 2 months ago
- Data and Documentation for Kaleida's Attention Index☆12Updated 6 years ago
- An easy-to-use python client for Google News feeds.☆50Updated 3 years ago