Norconex / collector-filesystemLinks
Norconex Filesystem Collector is a flexible crawler for collecting, parsing, and manipulating data ranging from local hard drives to network locations into various data repositories such as search engines.
☆24Updated last year
Alternatives and similar repositories for collector-filesystem
Users that are interested in collector-filesystem are comparing it to the libraries listed below
Sorting:
- a pure javascript frontend for ElasticSearch search indices.☆80Updated 7 years ago
- an open-source data management platform for knowledge workers (https://github.com/dswarm/dswarm-documentation/wiki)☆54Updated 7 years ago
- Visualization of interaction between entities☆16Updated 9 years ago
- JSONiq Implementation that compiles to JavaScript☆66Updated 3 years ago
- SOLR bulk indexing utility for the command line.☆45Updated 2 weeks ago
- Browser version of Hyphe (WIP)☆31Updated 6 months ago
- This is the facade for installation and access to the individual components☆15Updated 7 years ago
- Simple taxonomy management tool and document classifier.☆56Updated 5 years ago
- Work in progress: a new visualization engine☆34Updated 3 months ago
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆275Updated 3 years ago
- This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better unders…☆47Updated 3 years ago
- Crawling github data☆32Updated last year
- Explore networks and publish narratives.☆52Updated 4 years ago
- interactive network visualization☆103Updated last week
- Zorba - the NoSQL processor☆42Updated last year
- ☆139Updated 2 years ago
- The open source tools for building, maintaining and deploying Topic Maps-based applications.☆57Updated 3 months ago
- SQLite external module to read any structured text file according to your parsing specification.☆20Updated 3 weeks ago
- Docker container to provide Apache Tika RESTful API☆41Updated 9 years ago
- Neddick: Open Source Information Discovery Platform☆36Updated 2 years ago
- Browsertrix: Containerized High-Fidelity Browser-Based Automated Crawling + Behavior System☆87Updated 4 years ago
- Create beautiful dashboards from data packages☆32Updated 2 years ago
- A cross-platform command line tool for parallelised content extraction and analysis.☆247Updated last month
- The smart and simple way to automate document assembly☆408Updated 7 years ago
- Suite of tools for detecting changes in web pages and their rendering☆55Updated last year
- Simple Storm-like distributed application implementation☆64Updated 11 years ago
- Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head☆171Updated 5 years ago
- Solr client and user interface for search☆22Updated last year
- An open source search engine for corporate data and websites.☆107Updated 8 years ago
- ☆27Updated 12 years ago