dpapathanasiou / CleanScrapeLinks
A no-nonsense web scraping tool which removes the crap and preserves the content in epub and pdf formats.
☆41Updated 10 years ago
Alternatives and similar repositories for CleanScrape
Users that are interested in CleanScrape are comparing it to the libraries listed below
Sorting:
- Update a local archive of your tweets.☆49Updated 13 years ago
- Check out https://github.com/webrecorder/webrecorder for newer version matching https://webrecorder.io☆38Updated 10 years ago
- Junk drawer of old scripts.☆18Updated 9 years ago
- A small command-line python script that creates a local backup of your Flickr data. It mirrors images, titles, description, tags, albums…☆56Updated 2 years ago
- Drive/Gmail/Calendar backups☆32Updated 5 years ago
- Back up the notes you’ve saved to Pinboard☆88Updated 5 months ago
- Search engine for subtitles☆10Updated 10 years ago
- Google Chrome Extension. Record All Browsing in Screenshots & Full Text. Search For Anything At Any Time. Never Forget Where You Read Som…☆309Updated 8 years ago
- Automatically chooses new tags for articles based on existing tagged items☆27Updated 7 years ago
- Your Access To Data☆73Updated 3 years ago
- Firefox or Chrome/Chromium bookmarks export to pretty HTML in Python☆16Updated 12 years ago
- Command-line tool to easily extract data from HTML or XML documents. Produces machine readable output.☆33Updated 6 years ago
- WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.☆48Updated 7 years ago
- litevault provides an ultra lightweight command line password manager written in a single python file☆28Updated 10 years ago
- Automatically exported from code.google.com/p/ccl-ssns☆47Updated 2 years ago
- hexdump(1) for Unicode data☆39Updated last year
- An adaptation of rss2mail that uses IMAP directly☆86Updated 4 years ago
- File Filer; sort files into structured directory tree. Tree can be structured based on various designs such as date (file modification ti…☆48Updated 8 years ago
- Start or attach to a process and monitor a customizable set of metrics (CPU, I/O, etc.)☆34Updated 8 years ago
- Command-line Python script to upload photos to Flickr☆108Updated 4 years ago
- Short script for removing watermarks from PDF files. Requires pdftk.☆59Updated 6 years ago
- NoPriv.py is a python script to backup any IMAP capable email account to a HTML archive, nicely browsable, instead of weird folders (Mail…☆348Updated 8 years ago
- Modular workflow assistant for book digitization☆132Updated 9 years ago
- Over-engineered tool for symlinking dotfiles☆37Updated 12 years ago
- A python script that looks for special lines in a markdown file and uses those lines to convert, clean up, and insert content from URLs i…☆16Updated 13 years ago
- ☆39Updated 4 years ago
- View browser history as a graph (Chrome extension)☆45Updated last year
- An eBook tool to extract ISBN or Metadata form eBook and rename them by using ISBN database and Metadata☆29Updated 10 years ago
- A command-line syndication feed monitor☆49Updated 11 months ago
- A collection of small hacks I wrote over the years☆36Updated 3 years ago