18F / scrapebox
A simple, system independent infrastructure for performing web scraping. Utilizes Vagrant virtualbox interface and puppet provisioning to create and execute scraping of web content to structured data quickly and easily without modifying your core system.
☆24Updated 10 years ago
Alternatives and similar repositories for scrapebox:
Users that are interested in scrapebox are comparing it to the libraries listed below
- A pastebin for tables.☆34Updated 11 years ago
- Python command line tools, for increased fu.☆46Updated 9 years ago
- A really simple WSGI way to serve static (or mixed) content.☆30Updated 5 years ago
- Junk drawer of old scripts.☆18Updated 8 years ago
- Twerp is the telephone hackers toolkit. It's also a command-line app for Twilio, written in Python☆26Updated 4 years ago
- Automatically tag pinboard bookmarks based on page text☆8Updated 9 years ago
- Django feeds provides an extensive database model for RSS feeds and a fault tolerant parser.☆30Updated 12 years ago
- A WayBack Machine Time-Lapse Generator☆29Updated 6 years ago
- Django async media encoding☆9Updated 7 years ago
- Open Source Social Media Monitoring And Engagement System Core/API☆36Updated 10 years ago
- Specialised bot for periodical grabs and video/audio/etc. webpage scrapes.☆11Updated 7 years ago
- A collection of Django extensions that add content-management facilities to Django projects.☆40Updated 9 years ago
- Write you a home page with bookmarks well-organized.☆16Updated 7 years ago
- 3bot is a software platform to build, configure and perform.☆11Updated 6 years ago
- framework for scraping legislative/government data☆85Updated 5 months ago
- Webhooks for Django *experimental*☆63Updated 15 years ago
- A tool to graph who has sent you the most emails☆18Updated 7 years ago
- A Python version (almost a port) of ProPublica's TableFu☆233Updated 11 years ago
- Very simple Netflix API client☆24Updated 14 years ago
- It is a plugin to pyexcel and provides the capability to present and write data in text formats using tabulate☆11Updated 7 years ago
- Python module to watch Twitter user pages or search-results.☆62Updated 10 years ago
- A utility that provides an entry point for integrating front end designers into a django project☆27Updated 7 years ago
- This is a heroku buildpack for Pelican.☆23Updated 2 years ago
- ☆36Updated last year
- Simple plugin to sniff inbound search terms from popular search engines☆37Updated 16 years ago
- video indexing site☆217Updated 9 years ago
- WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.☆46Updated 6 years ago
- An autoscaling python script for Heroku☆27Updated 12 years ago
- A native web-based client for Slack.☆23Updated 7 years ago
- Feedbuffer buffers RSS and Atom syndication feeds, that is to say it caches new feed entries until the news aggregator requests them and …☆19Updated 8 years ago