JustAnotherArchivist / little-things
The little things give you away... A collection of various small helper stuff – Mirror repo only, no longer kept in sync, refer to gitea.arpa.li instead
☆23Updated 4 years ago
Related projects ⓘ
Alternatives and complementary repositories for little-things
- Decentralized web archiving☆19Updated 6 years ago
- Chrome extension that uses Memento to indicate that a page a user is viewing on the live web has an archived copy and to give the user ac…☆49Updated last month
- how hard is it to get a list of all local news sites in the United States (LOL)☆8Updated 4 years ago
- Scripts for Internet Archive☆12Updated 4 years ago
- A financial disclosure data extraction tool.☆13Updated last year
- webapp for unglue.it - A Free Ebook Foundation program☆16Updated this week
- A framework for quick web archiving; canonical repository: https://gitea.arpa.li/JustAnotherArchivist/qwarc☆27Updated 3 years ago
- Extract list of results from search engines pages as CSV with a bookmarklet directly within the browser☆19Updated this week
- CLI implementation of httpreserve that can test links and retrieve internet archive replacements☆10Updated this week
- Find rss, atom, xml, and rdf feeds on webpages☆30Updated last month
- 🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.☆50Updated 3 months ago
- Personal news feed: search for results on Reddit/Pinboard/Twitter/Hackernews and read as RSS☆29Updated 2 months ago
- Grabbing all news.☆62Updated 4 years ago
- Examples of bad data, especially from government.☆22Updated 3 months ago
- Generate a list of your GitHub stars by topic - automatically!☆71Updated last year
- A base library for building web scrapers for statistical data, and a helper ontology for (primarily Swedish) statistical data.☆13Updated last year
- Bot for operating snscrape in #archivebot on efnet☆10Updated 4 years ago
- Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki☆25Updated 3 months ago
- Clean a series of links, resolving redirects and finding Wayback results if page is gone. Originally written to aid with importing from A…☆15Updated last month
- Command line tool to convert a file in the WARC format to a file in the ZIM format☆45Updated last week
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆42Updated 5 years ago
- Scrape Twitter API without authentication using Nitter.☆61Updated 2 years ago
- Track changes to GraphQL APIs by git scraping their schemas☆23Updated 2 weeks ago
- Awesome list dedicated to digital and data preservation tools, sources, services and so on.☆20Updated 2 years ago
- DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by Arch…☆15Updated 9 months ago
- Webrecorder Automated In-Page Behavior Framework☆12Updated 3 years ago
- A helper library full of URL-related heuristics.☆64Updated last month
- API client for Aleph, supports bulk entity and document upload.☆28Updated last month
- Full archive of IndieWeb chat log data files☆13Updated this week
- This is the HeadQuarters of my digital info. HPI library got me inspired and I'm trying to play with the idea on a smaller scale for myse…☆19Updated last year