bejean / crawl-anywhereLinks
Crawl-Anywhere - Web Crawler and document processing pipeline with Solr integration.
☆98Updated 8 years ago
Alternatives and similar repositories for crawl-anywhere
Users that are interested in crawl-anywhere are comparing it to the libraries listed below
Sorting:
- ☆66Updated 9 years ago
- Additional opennlp mapping type for elasticsearch in order to perform named entity recognition☆136Updated 9 years ago
- The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)☆223Updated 3 years ago
- A python library detect and extract listing data from HTML page.☆108Updated 8 years ago
- Twitter River Plugin for elasticsearch (STOPPED)☆203Updated last year
- Html Content / Article Extractor in Scala - open sourced from Gravity Labs - http://gravity.com☆343Updated 6 years ago
- a pure javascript frontend for ElasticSearch search indices.☆80Updated 7 years ago
- Automatic, zero-config web scraping -- written in Java, has no dependency on Java EE or app servers, and the web scraper has a restful/JS…☆156Updated 8 years ago
- Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or fi…☆196Updated 3 weeks ago
- A plugin for language detection in Elasticsearch using Nakatani Shuyo's language detector☆252Updated 8 years ago
- A Query Autofiltering SearchComponent for Solr that can translate free-text queries into structured queries using index metadata☆26Updated 7 years ago
- Carrot2 plugin for ElasticSearch☆294Updated 3 years ago
- Web Crawler for Elasticsearch☆235Updated 6 years ago
- Solrstrap is a Query-Result interface for Solr written in JavaScript, HTML and CSS☆87Updated 8 years ago
- Structured Data Extractor. An application to extract structured data from web pages. It uses Data Extraction Based on Partial Tree Alignm…☆49Updated 13 years ago
- A JavaScript framework for creating user interfaces to Solr.☆655Updated 4 years ago
- A platform for backing crowdsourcing websites, built in golang for elasticsearch☆360Updated 5 years ago
- Open-source Enterprise Grade Search Engine Software☆512Updated 3 years ago
- Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.☆283Updated 7 years ago
- Feed discovery to share :)☆41Updated 9 years ago
- FacetView is a pure javascript frontend for ElasticSearch.☆291Updated 10 years ago
- A bundle of useful Elasticsearch plugins☆112Updated last year
- A text tagger based on Lucene / Solr, using FST technology☆177Updated 2 years ago
- Old and outdated version of RapidMiner Studio 5. See rapidminer-studio for the latest version 7.x☆122Updated 11 years ago
- Lucene Auto Phrase TokenFilter implementation☆59Updated 7 years ago
- The WikiBrain Java library enables researchers and developers to incorporate state-of-the-art Wikipedia-based algorithms and technologies…☆95Updated 7 years ago
- Mapper Attachments Type plugin for Elasticsearch☆504Updated 2 years ago
- open source big data integration, analytics, and visualization☆420Updated 8 years ago
- ☆185Updated 7 years ago
- Dice Solr Plugins from Simon Hughes Dice.com☆88Updated 4 years ago