bejean / crawl-anywhereLinks
Crawl-Anywhere - Web Crawler and document processing pipeline with Solr integration.
☆98Updated 8 years ago
Alternatives and similar repositories for crawl-anywhere
Users that are interested in crawl-anywhere are comparing it to the libraries listed below
Sorting:
- The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)☆221Updated 2 years ago
- Additional opennlp mapping type for elasticsearch in order to perform named entity recognition☆136Updated 9 years ago
- ☆66Updated 8 years ago
- Html Content / Article Extractor in Scala - open sourced from Gravity Labs - http://gravity.com☆343Updated 6 years ago
- Automatic, zero-config web scraping -- written in Java, has no dependency on Java EE or app servers, and the web scraper has a restful/JS…☆155Updated 8 years ago
- Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or fi…☆195Updated last week
- A python library detect and extract listing data from HTML page.☆108Updated 8 years ago
- Twitter River Plugin for elasticsearch (STOPPED)☆203Updated last year
- Studio web tool☆125Updated 3 weeks ago
- How to spot first stories on Twitter using Storm.☆124Updated last year
- Naive Bayes Classifier implemented with Elasticsearch Aggregations☆51Updated 11 years ago
- A plugin for language detection in Elasticsearch using Nakatani Shuyo's language detector☆252Updated 7 years ago
- Small hack to draw date histogram facets as graph using nvd3.js☆54Updated 12 years ago
- Web Crawler for Elasticsearch☆234Updated 6 years ago
- A platform for backing crowdsourcing websites, built in golang for elasticsearch☆360Updated 5 years ago
- Lucene Auto Phrase TokenFilter implementation