Norconex / collector-filesystem
Norconex Filesystem Collector is a flexible crawler for collecting, parsing, and manipulating data ranging from local hard drives to network locations into various data repositories such as search engines.
☆22Updated 7 months ago
Alternatives and similar repositories for collector-filesystem:
Users that are interested in collector-filesystem are comparing it to the libraries listed below
- A java library for creating standalone, portable, schema-full object databases supporting pagination and faceted search, and offering str…☆16Updated 8 years ago
- Web/FileSystem Crawler Library☆29Updated last week
- Open Source, Distributed, Big Data Enterprise Search Engine☆69Updated last month
- The legacy distributed object storage server developed by PitchPoint Solutions can store billions of large and small files using minimal …☆85Updated 2 years ago
- High-security graph database☆62Updated 2 years ago
- A library to store metadata of relational databases including the schema, statistics, and integrity constraints.☆25Updated 6 years ago
- Index and search PDF files using Apache Lucene and PDF Box☆43Updated 4 years ago
- Javascript library to talk to multiple OLAP backends from multiple frontends☆17Updated 12 years ago
- Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or fi…☆188Updated this week
- This is the facade for installation and access to the individual components☆15Updated 6 years ago
- Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser☆13Updated 3 years ago
- Minimal web framework, implemented in Java, resembling node.js+express.☆12Updated 2 years ago
- Fast in-memory graph structure, powering Gephi☆75Updated 5 months ago
- Secure REST service to index, search, retrieve and aggregate content from heterogeneous sources.☆20Updated 6 months ago
- Enterprise backend as a service☆70Updated 6 years ago
- This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better unders…☆45Updated 3 years ago
- Provides a modern and scalable web server as SIRIUS module☆22Updated 2 weeks ago
- The next generation of open source search☆90Updated 7 years ago
- Apache NiFi Custom Processor Extracting Text From Files with Apache Tika☆35Updated last year
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆24Updated 7 years ago
- An open source search engine for corporate data and websites.☆106Updated 7 years ago
- Mirror of Apache OpenNLP Add-ons☆17Updated last week
- A PDFBox fork intended to be used as PDF processor for Sejda and PDFsam☆50Updated last week
- A Java implementation of SpamSum / SSDeep☆14Updated 8 years ago
- metric² is a self-service BI tool using responsive web based dashboards to display any data from your SAP HANA Database, Platform or Appl…☆28Updated last year
- OrientDB Elastic Search Plugin☆9Updated 8 years ago
- A Java library for working with Frictionless Data Data Packages.☆22Updated last week
- Create a windows installer☆12Updated 9 years ago
- Angular JS Solr and Elasticsearch and OpenSearch Diagnostic Search Services☆26Updated last month
- Visualization of result returning by Solr 6 graph query☆10Updated 8 years ago