hollingsworthd / ScreenSlicer
Automatic, zero-config web scraping -- written in Java, has no dependency on Java EE or app servers, and the web scraper has a restful/JSON API. Currently unmaintained.
☆155Updated 7 years ago
Alternatives and similar repositories for ScreenSlicer:
Users that are interested in ScreenSlicer are comparing it to the libraries listed below
- ☆20Updated 7 years ago
- convenient web rss-reader☆51Updated 11 months ago
- A simple proxy web service in 19 lines of Python code.☆23Updated 10 years ago
- OpenBlock is a web application and RESTful service that allows users to browse and search their local area for "hyper-local news☆61Updated 3 years ago
- Faceted search engine for domain-specific exploration of the Web☆45Updated 8 years ago
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆16Updated 9 years ago
- Quickly analyze and explore email with advanced analytics and visualization.☆56Updated 3 years ago
- Algorithmic summarizer for RSS/Atom Feeds, Web Urls and arbitrary text. Codebase for the application deployed at http://tldrzr.herokuapp.…☆53Updated 8 years ago
- Blog crawler for the blogforever project.☆22Updated 11 years ago
- ☆25Updated 9 years ago
- How to spot first stories on Twitter using Storm.☆125Updated last year
- A component based data flow framework with a drag-n-drop Web 2.0 interface. Based on Stackless Python and inspired by Yahoo! Pipes.☆150Updated 12 years ago
- Akiva is a simple natural-language-processing, question-answering, artificial intelligence.☆349Updated 11 years ago
- Download *ALL* the submissions from Hacker News☆50Updated 11 years ago
- Python library with common functionality for writing web scrapers☆102Updated 9 years ago
- A tool for calculation semantic similarity between words from a text corpus based on lexico-syntactic patterns.☆28Updated 9 years ago
- Slinky, a high-performance web crawler / text analytics in Python, Redis, Hadoop, R, Gephi☆41Updated 14 years ago
- Whit is an open source SMS service, which allows you to query CrunchBase, Wikipedia, and several other data APIs.☆198Updated 11 years ago
- A queue-controlled browser automation tool for improving web crawl quality☆60Updated last week
- IPython Notebook Cookbook for Deployment via Chef☆41Updated 8 years ago
- A bot that, whenever your post MarkovME on reddit, responds with a random sampling of your comments using a Markov Chain.☆17Updated 10 years ago
- Entry for the Third Annual GitHub Data Challenge☆35Updated 10 years ago
- Bixo is an open source web mining toolkit that runs as a series of Cascading pipes on top of Hadoop. By building a customized Cascading p…☆142Updated 2 years ago
- Raw Benchmark Data for Popular Machine Learning Frameworks☆56Updated 7 years ago
- ☆13Updated 9 years ago
- Additional opennlp mapping type for elasticsearch in order to perform named entity recognition☆136Updated 8 years ago
- Simple Python scripts to download all Hacker News submissions and comments and store them in a PostgreSQL database.☆120Updated 7 years ago
- Real-Time, Twitter sentiment analyzer engine☆144Updated 10 years ago
- Vizlinc☆14Updated 9 years ago
- Sometimes sites make crawling hard. Selenium-crawler uses selenium automation to fix that.☆125Updated 11 years ago