Norconex / collector-filesystem
Norconex Filesystem Collector is a flexible crawler for collecting, parsing, and manipulating data ranging from local hard drives to network locations into various data repositories such as search engines.
☆21Updated last year
Related projects: ⓘ
- Apache NiFi Custom Processor Extracting Text From Files with Apache Tika☆34Updated last year
- Suite of tools for detecting changes in web pages and their rendering☆53Updated 9 months ago
- Javascript library to talk to multiple OLAP backends from multiple frontends☆18Updated 11 years ago
- Index and search PDF files using Apache Lucene and PDF Box☆41Updated 3 years ago
- Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, what…☆32Updated last year
- Enterprise backend as a service☆69Updated 5 years ago
- Web/FileSystem Crawler Library☆28Updated last month
- Sensefy is a federated enterprise semantic search framework built on Apache ManifoldCF, Apache Solr and Apache Stanbol. Development is sp…☆15Updated 2 years ago
- an open-source data management platform for knowledge workers (https://github.com/dswarm/dswarm-documentation/wiki)☆54Updated 6 years ago
- This is the facade for installation and access to the individual components☆16Updated 6 years ago
- Uses your app logs to visualize how the data moves between the code, database, HTTP services, message queue, external storages etc.☆23Updated 5 months ago
- UI for ZenQuery - Enterprise Backend as a Service☆10Updated 7 years ago
- A toolkit for clustering web pages based on various similarity measures.☆32Updated 2 years ago
- Provenance: Linking and Understanding Sources☆17Updated 3 months ago
- Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or fi…☆181Updated this week
- GUI tool to map any JSON-based Web API, plus node server to access it as if it were a HAL Hypermedia API☆27Updated 6 years ago
- A library to store metadata of relational databases including the schema, statistics, and integrity constraints.☆24Updated 6 years ago
- Open Source, Distributed, Big Data Enterprise Search Engine☆68Updated this week
- Greylock is an embedded search engine which is aimed at index size and performace☆12Updated 7 years ago
- Mirror of Apache MetaModel Membrane☆16Updated 5 years ago
- Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser☆13Updated 2 years ago
- Angular JS Solr and Elasticsearch and OpenSearch Diagnostic Search Services☆25Updated 3 months ago
- Solr Relevance Ranking Analysis and Visualization Tool☆17Updated 4 years ago
- A primitive REST Api wrapper for SQL calls to a JDBC compliant database☆19Updated 8 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆24Updated 6 years ago
- PdfJs-Annotator is a proof of concept project that integrates AnnotatorJs (http://annotatorjs.org/) with the PdfJs (https://mozilla.githu…☆22Updated 4 years ago
- Mirror of Apache OpenNLP Add-ons☆16Updated 3 weeks ago
- Scalable Optical Character Recognition with Apache NiFi and Tesseract☆31Updated 8 years ago
- Extra pluggable modules for Apache MetaModel (but licensed with LGPL)☆17Updated 2 years ago
- KnowledgeStore☆20Updated 6 years ago