hollingsworthd / ScreenSlicerLinks
Automatic, zero-config web scraping -- written in Java, has no dependency on Java EE or app servers, and the web scraper has a restful/JSON API. Currently unmaintained.
☆155Updated 8 years ago
Alternatives and similar repositories for ScreenSlicer
Users that are interested in ScreenSlicer are comparing it to the libraries listed below
Sorting:
- Mavenized version of Kelvin Tan's example (http://www.lucenetutorial.com/lucene-in-5-minutes.html)☆70Updated 6 months ago
- cron-like jobs for back-end systems☆77Updated 6 years ago
- How to spot first stories on Twitter using Storm.☆125Updated last year
- Bixo is an open source web mining toolkit that runs as a series of Cascading pipes on top of Hadoop. By building a customized Cascading p…☆142Updated 3 years ago
- YCB Java☆27Updated 2 years ago
- Lucene plugin for indexing and searching files stored in Baratine distributed filesystem☆16Updated 9 years ago
- Blog crawler for the blogforever project.☆22Updated 11 years ago
- An open source search engine for corporate data and websites.☆106Updated 8 years ago
- Crawl-Anywhere - Web Crawler and document processing pipeline with Solr integration.☆96Updated 8 years ago
- A collection of efficient utilities for a data scientist.☆41Updated 10 years ago
- [obsolete] Moved to https://github.com/rometools/rome☆23Updated 9 years ago
- The Cognitive Foundry is an open-source Java library for building intelligent systems using machine learning☆134Updated 4 years ago
- A crawler to collect reviews and product information on Amazon.com☆75Updated 9 years ago
- Sikuli-Slides is a visual automation tool that enables users to automate and test Graphical User Interfaces (GUIs) using presentation sli…☆66Updated 9 years ago
- Algorithms that build k-nearest neighbors graph (k-nn graph): Brute-force, NN-Descent,...☆34Updated 6 years ago
- The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)☆216Updated 2 years ago
- A component based data flow framework with a drag-n-drop Web 2.0 interface. Based on Stackless Python and inspired by Yahoo! Pipes.☆150Updated 12 years ago
- JAVA implementation of Multinomial Naive Bayes Text Classifier.☆95Updated 10 years ago
- A fast and easy to use decision tree learner in java☆233Updated 3 years ago
- Simplified scalable aggregation and processing framework built upon Apache Camel.☆22Updated 6 years ago
- A model-view based code generator written in Java☆40Updated 8 years ago
- OpenBlock is a web application and RESTful service that allows users to browse and search their local area for "hyper-local news☆61Updated 4 years ago
- A queue-controlled browser automation tool for improving web crawl quality☆61Updated 4 months ago
- Collects multimedia content shared through social networks.☆19Updated 10 years ago
- Secure REST service to index, search, retrieve and aggregate content from heterogeneous sources.☆20Updated 9 months ago
- WARC (Web Archive) Input and Output Formats for Hadoop☆36Updated 10 years ago
- Recommendations Serving Engine using python☆28Updated 9 years ago
- A stream of deduplicated tweets built using RxJava and Twitter4J☆10Updated 9 years ago
- Depreciated, use project scrape-itebooks☆32Updated 9 years ago
- A flexible pure-Java OCR implementation. Eventually.☆20Updated 10 years ago