dpapathanasiou / CleanScrapeLinks
A no-nonsense web scraping tool which removes the crap and preserves the content in epub and pdf formats.
☆41Updated 10 years ago
Alternatives and similar repositories for CleanScrape
Users that are interested in CleanScrape are comparing it to the libraries listed below
Sorting:
- Check out https://github.com/webrecorder/webrecorder for newer version matching https://webrecorder.io☆38Updated 10 years ago
- Junk drawer of old scripts.☆18Updated 9 years ago
- Google Chrome Extension. Record All Browsing in Screenshots & Full Text. Search For Anything At Any Time. Never Forget Where You Read Som…☆309Updated 8 years ago
- Recover lost websites from the Web Infrastructure☆91Updated 5 months ago
- An attempt to document commonly believed misconceptions about Tor.☆14Updated 8 years ago
- Drive/Gmail/Calendar backups☆32Updated 5 years ago
- A small command-line python script that creates a local backup of your Flickr data. It mirrors images, titles, description, tags, albums…☆56Updated 2 years ago
- View browser history as a graph (Chrome extension)☆45Updated last year
- PageArchiver (previously called "Scrapbook for SingleFile") is a Chrome extension that helps to archive pages for offline reading☆90Updated 12 years ago
- Auto backup your github stars and repos☆13Updated 10 years ago
- Update a local archive of your tweets.☆49Updated 13 years ago
- A monitoring device☆79Updated 11 years ago
- One-Click User Instigated Preservation☆129Updated 6 years ago
- A dockerized, queued high fidelity web archiver based on Squidwarc☆60Updated last year
- Parse a Facebook Message export and analyse it.☆13Updated 7 years ago
- litevault provides an ultra lightweight command line password manager written in a single python file☆28Updated 10 years ago
- Trough: Big data, small databases.☆41Updated last year
- File Filer; sort files into structured directory tree. Tree can be structured based on various designs such as date (file modification ti…☆48Updated 8 years ago
- ☆36Updated 2 years ago
- Extract list of results from search engines pages as CSV with a bookmarklet directly within the browser☆29Updated last week
- The Email Privacy Tester☆91Updated 9 years ago
- Parse OCR result files for pagenos, tables of contents, etc.☆14Updated 14 years ago
- I'm Leselys, your very elegant RSS reader.☆226Updated 5 years ago
- A Memento Aggregator CLI and Server in Go☆76Updated 10 months ago
- WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.☆48Updated 7 years ago
- Every document published from the Snowden archive☆73Updated 10 years ago
- Utilities to operate on lots of PDF files☆25Updated 4 years ago
- Search engine for subtitles☆10Updated 10 years ago
- Your Access To Data☆73Updated 3 years ago
- A clean-room clone of the Fever RSS aggregator, focusing on the API☆60Updated 3 years ago