hollingsworthd / ScreenSlicerLinks
Automatic, zero-config web scraping -- written in Java, has no dependency on Java EE or app servers, and the web scraper has a restful/JSON API. Currently unmaintained.
☆155Updated 8 years ago
Alternatives and similar repositories for ScreenSlicer
Users that are interested in ScreenSlicer are comparing it to the libraries listed below
Sorting:
- Java based implementation of Unofficial Google Trends API☆94Updated 10 years ago
- How to spot first stories on Twitter using Storm.☆124Updated last year
- A small Java library for simple text analysis - counting strings, identifying languages, and removing stop words.☆156Updated 5 years ago
- Readability clone in Java☆460Updated 4 years ago
- A library of examples showing how to use the Common Crawl corpus (2008-2012, ARC format)☆65Updated 9 years ago
- Crawl-Anywhere - Web Crawler and document processing pipeline with Solr integration.☆97Updated 8 years ago
- A component based data flow framework with a drag-n-drop Web 2.0 interface. Based on Stackless Python and inspired by Yahoo! Pipes.☆150Updated 12 years ago
- Algorithmic summarizer for RSS/Atom Feeds, Web Urls and arbitrary text. Codebase for the application deployed at http://tldrzr.herokuapp.…☆53Updated 8 years ago
- Blog crawler for the blogforever project.☆23Updated 11 years ago
- JAVA implementation of Multinomial Naive Bayes Text Classifier.☆96Updated 10 years ago
- XTractor is an algorithmic text extractor from web pages written in Java. It builds upon the "commonly used web design practices" approac…☆43Updated 9 years ago
- The pythonic way to code in Java.☆120Updated 9 years ago
- The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)☆218Updated 2 years ago
- Real-Time, Twitter sentiment analyzer engine☆144Updated 11 years ago
- scraper related helper functions☆27Updated 11 years ago
- A fast and easy to use decision tree learner in java☆234Updated 3 years ago
- Collects multimedia content shared through social networks.☆19Updated 10 years ago
- ☆20Updated 8 years ago
- Html Content / Article Extractor in Scala - open sourced from Gravity Labs - http://gravity.com☆343Updated 6 years ago
- A crawler to collect reviews and product information on Amazon.com☆75Updated 9 years ago
- Recommendations Serving Engine using python☆28Updated 10 years ago
- Human-Powered Data Analysis with Mechanical Turk☆300Updated 12 years ago
- Akiva is a simple natural-language-processing, question-answering, artificial intelligence.☆347Updated 11 years ago
- Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or fi…☆194Updated this week
- Bixo is an open source web mining toolkit that runs as a series of Cascading pipes on top of Hadoop. By building a customized Cascading p…☆142Updated 3 years ago
- cron-like jobs for back-end systems☆77Updated 6 years ago
- An interactive map of Stack Exchange tags for all sites.☆126Updated 2 years ago
- Deprecated. Formerly: scripts to make it easier to set up and manipulate clusters at Amazon EC2☆110Updated 13 years ago
- Scholar Ninja - Chrome extension. A distributed open search engine for scholarly content, based on a WebRTC DHT network☆115Updated 9 years ago
- RDF-Centric Map/Reduce Framework and Freebase data conversion tool☆149Updated 3 years ago