hollingsworthd / ScreenSlicerLinks
Automatic, zero-config web scraping -- written in Java, has no dependency on Java EE or app servers, and the web scraper has a restful/JSON API. Currently unmaintained! DO NOT USE THIS CODE unless you know what you're doing.
☆156Updated 8 years ago
Alternatives and similar repositories for ScreenSlicer
Users that are interested in ScreenSlicer are comparing it to the libraries listed below
Sorting:
- How to spot first stories on Twitter using Storm.☆124Updated 2 years ago
- The pythonic way to code in Java.☆120Updated 9 years ago
- Real-Time, Twitter sentiment analyzer engine☆143Updated 11 years ago
- Blog crawler for the blogforever project.☆23Updated 11 years ago
- ☆20Updated 8 years ago
- We introduce TACIT: An Open-Source Text Analysis, Crawling and Interpretation Tool. TACIT's plugin architecture has three main components…☆109Updated 6 years ago
- Crawl-Anywhere - Web Crawler and document processing pipeline with Solr integration.☆98Updated 8 years ago
- A component based data flow framework with a drag-n-drop Web 2.0 interface. Based on Stackless Python and inspired by Yahoo! Pipes.☆150Updated 13 years ago
- Bixo is an open source web mining toolkit that runs as a series of Cascading pipes on top of Hadoop. By building a customized Cascading p…☆142Updated 3 years ago
- A crawler to collect reviews and product information on Amazon.com☆75Updated 9 years ago
- convenient web rss-reader☆51Updated last year
- Face Detection. Modified version of http://code.google.com/p/jviolajones/☆63Updated 9 years ago
- Html Content / Article Extractor in Scala - open sourced from Gravity Labs - http://gravity.com☆343Updated 6 years ago
- JAVA implementation of Multinomial Naive Bayes Text Classifier.☆96Updated 11 years ago
- Algorithmic summarizer for RSS/Atom Feeds, Web Urls and arbitrary text. Codebase for the application deployed at http://tldrzr.herokuapp.…☆53Updated 9 years ago
- A small Java library for simple text analysis - counting strings, identifying languages, and removing stop words.☆156Updated 6 years ago
- The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)☆223Updated 3 years ago
- Code examples on how to use the Datumbox Machine Learning Framework.☆40Updated 2 years ago
- ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (image…☆95Updated 7 years ago
- Java based implementation of Unofficial Google Trends API☆94Updated 11 years ago
- Human-Powered Data Analysis with Mechanical Turk☆300Updated 13 years ago
- OpenBlock is a web application and RESTful service that allows users to browse and search their local area for "hyper-local news☆61Updated 4 years ago
- Readability clone in Java☆460Updated 5 years ago
- A fast and easy to use decision tree learner in java☆234Updated 3 years ago
- A library of examples showing how to use the Common Crawl corpus (2008-2012, ARC format)☆65Updated 9 years ago
- RDF-Centric Map/Reduce Framework and Freebase data conversion tool☆149Updated 4 years ago
- WARC (Web Archive) Input and Output Formats for Hadoop☆37Updated 11 years ago
- A simple proxy web service in 19 lines of Python code.☆23Updated 11 years ago
- ☆20Updated 8 years ago
- Scholar Ninja - Chrome extension. A distributed open search engine for scholarly content, based on a WebRTC DHT network☆115Updated 9 years ago