codelibs / fess-crawlerLinks
Web/FileSystem Crawler Library
☆29Updated this week
Alternatives and similar repositories for fess-crawler
Users that are interested in fess-crawler are comparing it to the libraries listed below
Sorting:
- Web Crawler for Elasticsearch☆234Updated 5 years ago
- ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (image…☆96Updated 7 years ago
- Document Enrichment plugin for Elasticsearch☆27Updated 5 months ago
- Vert.x web and commandline application to import CSV/XLS/XLSX files into ElasticSearch.☆118Updated 4 years ago
- The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)☆219Updated 2 years ago
- Elasticsearch plugin for b-bit minhash algorism☆63Updated last year
- Implementation of Vision Based Page Segmentation algorithm in Java☆103Updated 5 years ago
- Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or fi…☆194Updated this week
- Using latent Dirichlet allocation (LDA) in Apache Lucene☆58Updated 12 years ago
- Elasticsearch plugin offering Neo4j integration for Personalized Search☆157Updated 4 years ago
- Suite of tools for detecting changes in web pages and their rendering☆55Updated last year
- Open-domain question answering system from UNC Charlotte☆61Updated 9 years ago
- Distributed text analysis suite based on Celery☆96Updated 2 years ago
- A POC at replicating Facebook Graph Search with Cypher and Neo4j☆101Updated 12 years ago
- Open-source Enterprise Grade Search Engine Software☆509Updated 3 years ago
- Skeleton for Meetup - Building your own recommendation engine in an hour☆29Updated 4 years ago
- An open source search engine for corporate data and websites.☆106Updated 8 years ago
- A bundle of html content extraction algorithms☆122Updated 10 years ago
- Uses Apache Lucene, OpenNLP and geonames and extracts locations from text and geocodes them.☆38Updated last year
- Pulsar Data Visualization, gets the data from Pulsar Reporting API, builds different charts and displays them in the browser.☆53Updated 9 years ago
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆271Updated 2 years ago
- A simple scoring plugin for vector in Elasticsearch.☆70Updated 8 years ago
- Building recommenders with Elastic Graph!☆37Updated 4 years ago
- Storm / Solr Integration☆19Updated last year
- Integration between Stanford NLP and Apache Stanbol☆34Updated 9 years ago
- Text retrieval database based on simhash similarity search☆24Updated 2 years ago
- Machine learning components for Apache UIMA☆131Updated 2 years ago
- Carrot2 plugin for ElasticSearch☆291Updated 2 years ago
- Solr Relevance Ranking Analysis and Visualization Tool☆15Updated 5 years ago
- Computer and Humans Learn Mutually (Fast way to label text)☆11Updated 7 years ago