Norconex / collector-filesystem
Norconex Filesystem Collector is a flexible crawler for collecting, parsing, and manipulating data ranging from local hard drives to network locations into various data repositories such as search engines.
☆22Updated 3 months ago
Alternatives and similar repositories for collector-filesystem:
Users that are interested in collector-filesystem are comparing it to the libraries listed below
- A java library for creating standalone, portable, schema-full object databases supporting pagination and faceted search, and offering str…☆16Updated 7 years ago
- Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or fi…☆185Updated this week
- Automatically exported from code.google.com/p/xml2json-xslt☆38Updated 9 years ago
- Index and search PDF files using Apache Lucene and PDF Box☆43Updated 4 years ago
- Uses your app logs to visualize how the data moves between the code, database, HTTP services, message queue, external storages etc.☆23Updated 9 months ago
- Enterprise backend as a service☆70Updated 6 years ago
- Docker container to provide Apache Tika RESTful API☆40Updated 8 years ago
- Provenance: Linking and Understanding Sources☆17Updated 7 months ago
- an open-source data management platform for knowledge workers (https://github.com/dswarm/dswarm-documentation/wiki)☆54Updated 7 years ago
- Core API for Silverpeas☆49Updated this week
- Open Source, Distributed, Big Data Enterprise Search Engine☆69Updated last month
- Solrstrap is a Query-Result interface for Solr written in JavaScript, HTML and CSS☆86Updated 7 years ago
- Palladian is a Java-based toolkit with functionality for text processing, classification, information extraction, and data retrieval from…☆38Updated this week
- Quick starts for Teiid WildFly☆25Updated 5 years ago
- Demonstration of searching PDF document with Solr, Tika, and Tesseract☆30Updated 3 months ago
- Sensefy is a federated enterprise semantic search framework built on Apache ManifoldCF, Apache Solr and Apache Stanbol. Development is sp…☆15Updated 2 years ago
- Secure REST service to index, search, retrieve and aggregate content from heterogeneous sources.☆20Updated 3 months ago
- OrientDB Elastic Search Plugin☆9Updated 8 years ago
- A PDFBox fork intended to be used as PDF processor for Sejda and PDFsam☆51Updated this week
- Home of RDF2Go and RDFReactor☆13Updated 8 years ago
- Simple API for working with complex data formats such as XML and JSON☆29Updated 3 months ago
- Tool for visualizing hOCR output from Tesseract (or other OCR engines that support hOCR).☆23Updated 10 years ago
- GUI tool to map any JSON-based Web API, plus node server to access it as if it were a HAL Hypermedia API☆27Updated 6 years ago
- This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better unders…☆46Updated 3 years ago
- XML Director - XML Content Management☆15Updated last year
- Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser☆13Updated 2 years ago
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆15Updated 9 years ago
- Node.js based proxy to make a solr instance read-only.☆27Updated 8 years ago
- The jdbcspy is a lightweight profiling and monitoring proxy for your jdbc connection. It can be configured very easily and will provide i…☆12Updated 4 months ago
- Greylock is an embedded search engine which is aimed at index size and performace☆12Updated 8 years ago