codelibs / fess-crawlerLinks
Web/FileSystem Crawler Library
☆29Updated last week
Alternatives and similar repositories for fess-crawler
Users that are interested in fess-crawler are comparing it to the libraries listed below
Sorting:
- Apache OpenNLP Sandbox☆45Updated this week
- Elasticsearch plugin offering Neo4j integration for Personalized Search☆157Updated 4 years ago
- Elasticsearch plugin for b-bit minhash algorism☆63Updated last year
- Building recommenders with Elastic Graph!☆37Updated 5 years ago
- The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)☆220Updated 2 years ago
- ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (image…☆96Updated 7 years ago
- Pulsar Data Visualization, gets the data from Pulsar Reporting API, builds different charts and displays them in the browser.☆53Updated 9 years ago
- Using latent Dirichlet allocation (LDA) in Apache Lucene☆58Updated 12 years ago
- Skeleton for Meetup - Building your own recommendation engine in an hour☆29Updated 4 years ago
- Document Enrichment plugin for Elasticsearch☆27Updated 6 months ago
- This project deals with hierarchical classification of web pages based on dmoz dataset.☆14Updated 11 years ago
- Neo4j ElasticSearch Integration☆214Updated 4 years ago
- A web based data mining workflow platform with real-time analysis capabilities☆49Updated 2 years ago
- open source big data integration, analytics, and visualization☆420Updated 8 years ago
- Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.☆34Updated 2 years ago
- Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or fi…☆194Updated 2 weeks ago
- Orchestration, Management and Monitoring of Data Processing☆11Updated this week
- Big GeoSpatial Data Points Visualization Tool☆19Updated 9 years ago
- Additional opennlp mapping type for elasticsearch in order to perform named entity recognition☆136Updated 9 years ago
- Storm / Solr Integration☆19Updated last year
- A toolkit for clustering web pages based on various similarity measures.☆34Updated 3 years ago
- Apache NiFi NLP Processor☆18Updated last year
- A quick Elasticsearch/Logstash/Kibana (ELK) 7.x environment to quickly ingest realtime filtered tweets, perform Natural Language Processi…☆16Updated last year
- Suite of tools for detecting changes in web pages and their rendering☆55Updated last year
- Apache Joshua☆109Updated 5 years ago
- Text retrieval database based on simhash similarity search☆24Updated 2 years ago
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆273Updated 3 years ago
- Additional convenience processors not found in core Apache NiFi☆96Updated 3 years ago
- Uses Apache Lucene, OpenNLP and geonames and extracts locations from text and geocodes them.☆38Updated last year
- Implementation of Vision Based Page Segmentation algorithm in Java☆103Updated 5 years ago