Norconex / collector-filesystem
Norconex Filesystem Collector is a flexible crawler for collecting, parsing, and manipulating data ranging from local hard drives to network locations into various data repositories such as search engines.
☆22Updated 7 months ago
Alternatives and similar repositories for collector-filesystem
Users that are interested in collector-filesystem are comparing it to the libraries listed below
Sorting:
- Open Source, Distributed, Big Data Enterprise Search Engine☆69Updated this week
- A library to store metadata of relational databases including the schema, statistics, and integrity constraints.☆25Updated 6 years ago
- An open source search engine for corporate data and websites.☆106Updated 7 years ago
- Automatically exported from code.google.com/p/xml2json-xslt☆38Updated 10 years ago
- Core API for Silverpeas☆49Updated this week
- Python wrapper for Apache Tika, made to be easy_installed☆25Updated 13 years ago
- Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or fi…☆188Updated this week
- Javascript library to talk to multiple OLAP backends from multiple frontends☆17Updated 12 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆24Updated 7 years ago
- OrientDB Elastic Search Plugin☆9Updated 8 years ago
- Fast in-memory graph structure, powering Gephi☆75Updated last week
- Web/FileSystem Crawler Library☆29Updated last month
- Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser☆13Updated 3 years ago
- Netarchivesuite development☆21Updated last month
- Work in progress: a new visualization engine☆34Updated 11 months ago
- Extra pluggable modules for Apache MetaModel (but licensed with LGPL)☆17Updated 3 years ago
- A PDFBox fork intended to be used as PDF processor for Sejda and PDFsam☆50Updated last week
- A simple CMIS 1.1 server based on chemistry opencmis☆16Updated 6 months ago
- Uses your app logs to visualize how the data moves between the code, database, HTTP services, message queue, external storages etc.☆23Updated last year
- Database smell detector☆13Updated 7 years ago
- Common web archive utility code.☆55Updated 2 months ago
- High-security graph database☆62Updated 2 years ago
- Mirror of Apache MetaModel Membrane☆16Updated 5 years ago
- Grok is simple tool that allows you to easily parse logs☆40Updated 11 years ago
- Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.☆34Updated 2 years ago
- A java library for creating standalone, portable, schema-full object databases supporting pagination and faceted search, and offering str…☆16Updated 8 years ago
- Solr Relevance Ranking Analysis and Visualization Tool☆17Updated 5 years ago
- An HTTP proxy for Elasticsearch, Solr (etc.) to prevent a 100% full disk situation.☆11Updated 6 years ago
- Web application to download and schedule reports from Elasticsearch☆11Updated 8 years ago
- an open-source data management platform for knowledge workers (https://github.com/dswarm/dswarm-documentation/wiki)☆54Updated 7 years ago