18F / scrapeboxLinks
A simple, system independent infrastructure for performing web scraping. Utilizes Vagrant virtualbox interface and puppet provisioning to create and execute scraping of web content to structured data quickly and easily without modifying your core system.
☆24Updated 11 years ago
Alternatives and similar repositories for scrapebox
Users that are interested in scrapebox are comparing it to the libraries listed below
Sorting:
- A component based data flow framework with a drag-n-drop Web 2.0 interface. Based on Stackless Python and inspired by Yahoo! Pipes.☆150Updated 13 years ago
- WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.☆47Updated 7 years ago
- ☆36Updated 2 years ago
- A pastebin for tables.☆34Updated 12 years ago
- Bringing sanity to world of messed-up data☆66Updated 11 years ago
- Twerp is the telephone hackers toolkit. It's also a command-line app for Twilio, written in Python☆27Updated 5 years ago
- ScraperWiki Python library for scraping and saving data; in maintenance mode☆158Updated last week
- Write you a home page with bookmarks well-organized.☆16Updated 8 years ago
- Superfeedr powered pipes!☆131Updated 10 years ago
- Junk drawer of old scripts.☆18Updated 9 years ago
- A tool to graph who has sent you the most emails☆17Updated 8 years ago
- Open Source Social Media Monitoring And Engagement System Core/API☆37Updated 11 years ago
- Export a graph of link between crawled items by scrapy in dot file format.☆26Updated 14 years ago
- Specialised bot for periodical grabs and video/audio/etc. webpage scrapes.☆11Updated 8 years ago
- Python library with common functionality for writing web scrapers☆102Updated 10 years ago
- scraper related helper functions☆27Updated 11 years ago
- [UNMAINTAINED] Deploy, run and monitor your Scrapy spiders.☆11Updated 10 years ago
- Whit is an open source SMS service, which allows you to query CrunchBase, Wikipedia, and several other data APIs.☆198Updated 12 years ago
- craigslist blob service☆92Updated 8 years ago
- urllib2 wrapper to make life easier☆32Updated 13 years ago
- Friendly data search via Google Docs API☆26Updated 12 years ago
- Main repo for pinitto.me open source corkboard☆63Updated 5 years ago
- Serapis is a sentence identifier and modeling pipeline / built for Wordnik☆24Updated 9 years ago
- Taws - A personal and private web search engine☆24Updated 10 years ago
- a simple server that connects calls between citizens and their congress person using the Twilio API☆67Updated 4 years ago
- Viewers for statistics and dashboarding of Domain Search Engine data☆126Updated 9 years ago
- Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations☆40Updated last year
- Python scripts for scraping bus ticket data from the websites of BoltBus, Greyhound, Megabus, GoBus, Amtrak, Peterpan, and EasternTravel.☆38Updated 5 years ago
- AES encrypted password manager☆185Updated 11 years ago
- Tiny python web crawler☆169Updated 9 years ago