18F / scrapebox
A simple, system independent infrastructure for performing web scraping. Utilizes Vagrant virtualbox interface and puppet provisioning to create and execute scraping of web content to structured data quickly and easily without modifying your core system.
☆24Updated 10 years ago
Alternatives and similar repositories for scrapebox:
Users that are interested in scrapebox are comparing it to the libraries listed below
- A WayBack Machine Time-Lapse Generator☆29Updated 6 years ago
- A pastebin for tables.☆34Updated 11 years ago
- ☆36Updated last year
- An example REST API with Django, Tastypie, xAuth and Heroku☆72Updated 5 years ago
- 3bot is a software platform to build, configure and perform.☆11Updated 6 years ago
- Google Cloud Datastore storage module for Botkit☆13Updated 8 months ago
- Open Source Social Media Monitoring And Engagement System Core/API☆36Updated 10 years ago
- Friendly data search via Google Docs API☆26Updated 11 years ago
- ☆28Updated last year
- An admin webui for Sensu☆86Updated 9 years ago
- External plugins for BotBot.me☆32Updated 6 years ago
- Serapis is a sentence identifier and modeling pipeline / built for Wordnik☆24Updated 8 years ago
- A more liberal autolink extension for python Markdown☆20Updated 2 years ago
- Specialised bot for periodical grabs and video/audio/etc. webpage scrapes.☆11Updated 7 years ago
- Twerp is the telephone hackers toolkit. It's also a command-line app for Twilio, written in Python☆26Updated 4 years ago
- Write you a home page with bookmarks well-organized.☆16Updated 7 years ago
- Bringing sanity to world of messed-up data☆66Updated 10 years ago
- WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.☆46Updated 7 years ago
- A semantic analysis tool to generate synonym.txt files for Solr. [RETIRED]☆24Updated 8 years ago
- A tool for loading arbitrary content into Elasticsearch and serving that content on the web.☆29Updated 9 years ago
- A docker'ized internal-only tor relay.☆42Updated 9 years ago
- PDF Filler is a RESTful service (API) to aid in the completion of existing PDF-based forms and empowers web developers to use browser-bas…☆36Updated 11 years ago
- [UNMAINTAINED] Deploy, run and monitor your Scrapy spiders.☆11Updated 9 years ago
- Export a graph of link between crawled items by scrapy in dot file format.☆26Updated 13 years ago
- Very simple Netflix API client☆24Updated 14 years ago
- Ready or Not...☆50Updated 7 years ago
- Node wrapper for the Discourse API☆34Updated 3 years ago
- A portable, lightweight, locally-hosted IPv4 and IPv6 geolocation API/server☆40Updated 6 years ago
- ProjectMonitor is a CI display aggregator. It displays the status of multiple Continuous Integration builds on a single web page.☆17Updated 9 years ago
- ☆223Updated 9 years ago