hollingsworthd / ScreenSlicerLinks
Automatic, zero-config web scraping -- written in Java, has no dependency on Java EE or app servers, and the web scraper has a restful/JSON API. Currently unmaintained.
☆155Updated 8 years ago
Alternatives and similar repositories for ScreenSlicer
Users that are interested in ScreenSlicer are comparing it to the libraries listed below
Sorting:
- Blog crawler for the blogforever project.☆23Updated 11 years ago
- How to spot first stories on Twitter using Storm.☆124Updated last year
- ☆20Updated 8 years ago
- RDF-Centric Map/Reduce Framework and Freebase data conversion tool☆149Updated 3 years ago
- Human-Powered Data Analysis with Mechanical Turk☆300Updated 12 years ago
- Crawl-Anywhere - Web Crawler and document processing pipeline with Solr integration.☆97Updated 8 years ago
- The pythonic way to code in Java.☆120Updated 9 years ago
- A small Java library for simple text analysis - counting strings, identifying languages, and removing stop words.☆156Updated 5 years ago
- A component based data flow framework with a drag-n-drop Web 2.0 interface. Based on Stackless Python and inspired by Yahoo! Pipes.☆150Updated 12 years ago
- A simple proxy web service in 19 lines of Python code.☆23Updated 10 years ago
- Sample code showing Tweet activity volume using Twitter's Enterprise full-archive search API. Built with Django, Tweet embeds and C3.☆42Updated 4 years ago
- The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)☆220Updated 2 years ago
- Feed discovery to share :)☆41Updated 8 years ago
- Quickly analyze and explore email with advanced analytics and visualization.☆56Updated 3 years ago
- Real-Time, Twitter sentiment analyzer engine☆144Updated 11 years ago
- Akiva is a simple natural-language-processing, question-answering, artificial intelligence.☆347Updated 11 years ago
- Code to transform Hillary's emails from raw PDF documents to a SQLite database☆161Updated 9 years ago
- Html Content / Article Extractor in Scala - open sourced from Gravity Labs - http://gravity.com☆343Updated 6 years ago
- A library for extracting tables from PDF files☆89Updated 11 years ago
- Java based implementation of Unofficial Google Trends API☆94Updated 10 years ago
- Sometimes sites make crawling hard. Selenium-crawler uses selenium automation to fix that.☆125Updated 12 years ago
- [obsolete] Moved to https://github.com/rometools/rome☆23Updated 9 years ago
- ScraperWiki Python library for scraping and saving data☆158Updated 2 years ago
- scraper related helper functions☆27Updated 11 years ago
- JAVA implementation of Multinomial Naive Bayes Text Classifier.☆96Updated 10 years ago
- Vizlinc☆15Updated 9 years ago
- A queue-controlled browser automation tool for improving web crawl quality☆62Updated last month
- cron-like jobs for back-end systems☆77Updated 7 years ago
- We introduce TACIT: An Open-Source Text Analysis, Crawling and Interpretation Tool. TACIT's plugin architecture has three main components…☆109Updated 6 years ago
- Bixo is an open source web mining toolkit that runs as a series of Cascading pipes on top of Hadoop. By building a customized Cascading p…☆142Updated 3 years ago