bejean / crawl-anywhereLinks
Crawl-Anywhere - Web Crawler and document processing pipeline with Solr integration.
☆96Updated 8 years ago
Alternatives and similar repositories for crawl-anywhere
Users that are interested in crawl-anywhere are comparing it to the libraries listed below
Sorting:
- Additional opennlp mapping type for elasticsearch in order to perform named entity recognition☆136Updated 9 years ago
- The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)☆216Updated 2 years ago
- ☆66Updated 8 years ago
- a pure javascript frontend for ElasticSearch search indices.☆80Updated 7 years ago
- A plugin for language detection in Elasticsearch using Nakatani Shuyo's language detector☆252Updated 7 years ago
- Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or fi…☆190Updated 3 weeks ago
- Web Crawler for Elasticsearch☆235Updated 5 years ago
- FacetView is a pure javascript frontend for ElasticSearch.☆291Updated 10 years ago
- Twitter River Plugin for elasticsearch (STOPPED)☆204Updated 10 months ago
- Crawler is a bare-bones spider designed to quickly and effectively build an index of all files and pages on a given Web site as well as t…☆91Updated 13 years ago
- ☆28Updated 9 years ago
- Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.☆282Updated 7 years ago
- Analysis plugin for ElasticSearch providing capability for processing inline annotations in documents.☆35Updated 11 years ago
- ☆284Updated 3 years ago
- A Query Autofiltering SearchComponent for Solr that can translate free-text queries into structured queries using index metadata☆28Updated 6 years ago
- The template project for three way and five way sentiment classification☆11Updated 8 years ago
- Natural Language Processing Toolkit for PHP☆134Updated 12 years ago
- The (overall) documentation of the d:swarm platform (https://github.com/dswarm/dswarm-documentation/wiki)☆21Updated 9 years ago
- solr-logstash☆43Updated 9 years ago
- Solr AutoComplete implementation☆59Updated 7 years ago
- An extension to the demo template of ElasticUI a beautiful AngularJS frontend to ElasticSearch for faceted navigation☆39Updated 10 years ago
- Automatic, zero-config web scraping -- written in Java, has no dependency on Java EE or app servers, and the web scraper has a restful/JS…☆155Updated 8 years ago
- CommonCrawl WARC/WET/WAT examples and processing code for Java + Hadoop☆56Updated 4 years ago
- Web application for the visual composition of dashboards☆100Updated 8 years ago
- Solr Query Segmenter for structuring unstructured queries☆22Updated 4 years ago
- Html Content / Article Extractor in Scala - open sourced from Gravity Labs - http://gravity.com☆343Updated 5 years ago
- An open source search engine for corporate data and websites.☆106Updated 7 years ago
- Dice Solr Plugins from Simon Hughes Dice.com☆87Updated 4 years ago
- A platform for backing crowdsourcing websites, built in golang for elasticsearch☆359Updated 4 years ago
- a full cross platform video screen capture tool and host. Java based screen recorder, and Django based web backend. Also included is a di…☆105Updated 8 years ago