bejean / crawl-anywhere
Crawl-Anywhere - Web Crawler and document processing pipeline with Solr integration.
☆96Updated 7 years ago
Alternatives and similar repositories for crawl-anywhere
Users that are interested in crawl-anywhere are comparing it to the libraries listed below
Sorting:
- Additional opennlp mapping type for elasticsearch in order to perform named entity recognition☆136Updated 9 years ago
- The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)☆215Updated 2 years ago
- Web Crawler for Elasticsearch☆235Updated 5 years ago
- Distributed Realtime Search with Lucene and MongoDB☆59Updated 7 years ago
- ☆66Updated 8 years ago
- solr-logstash☆43Updated 9 years ago
- Educational Examle of a custom Lucene Query & Scorer☆48Updated 5 years ago
- CommonCrawl WARC/WET/WAT examples and processing code for Java + Hadoop☆56Updated 4 years ago
- CSV format for Elasticsearch REST search responses☆42Updated 4 years ago
- How to spot first stories on Twitter using Storm.☆125Updated last year
- A Query Autofiltering SearchComponent for Solr that can translate free-text queries into structured queries using index metadata☆28Updated 6 years ago
- ☆18Updated 8 years ago
- Twitter River Plugin for elasticsearch (STOPPED)☆203Updated 9 months ago
- The (overall) documentation of the d:swarm platform (https://github.com/dswarm/dswarm-documentation/wiki)☆21Updated 9 years ago
- Blog crawler for the blogforever project.☆22Updated 11 years ago
- Example code for the book "Indexing Data in Apache Solr"☆43Updated 5 years ago
- Studio web tool☆126Updated last month
- An open source search engine for corporate data and websites.☆106Updated 7 years ago
- PredictionIO Classification Engine Template (Scala-based parallelized engine)☆39Updated 5 years ago
- Superfeedr powered pipes!☆131Updated 9 years ago
- Structured Data Extractor. An application to extract structured data from web pages. It uses Data Extraction Based on Partial Tree Alignm…☆49Updated 12 years ago
- Automatic, zero-config web scraping -- written in Java, has no dependency on Java EE or app servers, and the web scraper has a restful/JS…☆155Updated 7 years ago
- Solr Redis Extensions☆52Updated last year
- This is a Fact based Question Answering System using Apache Solr as backend search engine, Wikipedia dumps as information source, Apache …☆26Updated 2 years ago
- Solr Query Segmenter for structuring unstructured queries☆21Updated 4 years ago
- a pure javascript frontend for ElasticSearch search indices.☆79Updated 7 years ago
- Sentiment analysis framework developed by CERTH.☆22Updated 9 years ago
- Frequent Pattern Mining☆16Updated 8 years ago
- a json aware ElasticSearch front end☆299Updated 11 years ago
- PredictionIO E-Commerce Recommendation Engine Template (Scala-based parallelized engine)☆74Updated 5 years ago