chrismattmann / etllib
This is the ETL lib package. It provides an API to munge and prepare JSON, TSV and other data using Apache Tika and JSON parsing/loading for ETL via Apache OODT (or other libs) into Apache Solr.
☆16Updated 9 months ago
Related projects ⓘ
Alternatives and complementary repositories for etllib
- All that entity matching, resolution, normalization, enhancement and reconciliation madness, but with a focus on data, not platforms.☆24Updated 2 years ago
- Uses Apache Lucene, OpenNLP and geonames and extracts locations from text and geocodes them.☆36Updated 7 months ago
- Efficient indexing and retrieval of OCR bounding boxes in Solr☆22Updated 5 years ago
- The OpenSextant Gazetteer is a collection of world-wide place name data☆12Updated 6 years ago
- LINKED DATA QUALITY REPORTS☆41Updated 2 years ago
- Advanced desktop search/corpus exploration prototype☆21Updated 3 years ago
- Big GeoSpatial Data Points Visualization Tool☆19Updated 8 years ago
- SKOS analysis for Elasticsearch☆54Updated 8 years ago
- For interacting with nutch via Python☆23Updated 2 weeks ago
- BatchRefine adds batch processing capabilities to OpenRefine☆50Updated 7 years ago
- Vizlinc☆14Updated 8 years ago
- Named-Entity Recognition extension for Google Refine / OpenRefine☆72Updated 7 years ago
- OpenRefine reconciliation services for VIAF, ORCID, and Open Library + framework for creating more.☆112Updated last year
- 💠 + 📚 OpenRefine on Binder!☆40Updated 4 months ago
- An HTTP proxy for Elasticsearch, Solr (etc.) to prevent a 100% full disk situation.☆11Updated 6 years ago
- Execute OpenRefine JSON scripts without OpenRefine (or Java)☆29Updated last year
- A framework to allow the matching of string entities using customised sets of transformations and matchers, plus a tool to produce the ne…☆30Updated 7 years ago
- Prototype SOLR-powered web archive exploration UI.☆43Updated 4 years ago
- Utilities for working with streaming XML pipelines☆13Updated 8 years ago
- System for mining Wikipedia Usage data to read our collective mind☆21Updated 10 years ago
- SKOS Support for Apache Lucene and Solr☆56Updated 3 years ago
- A project aiming "to significantly advance the state of the art with regard to indexing and querying biomedical data with freely availabl…☆77Updated 3 months ago
- An experiment in visualizing your Solr index via term counts, document counts, and memory usage per field and data type.☆15Updated 9 years ago
- Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.☆32Updated last year
- LOD-enabled version of OpenRefine. (This project is not actively maintained anymore)☆61Updated 5 years ago
- Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri☆46Updated 2 years ago
- sparql-stream sensor queries☆16Updated 8 years ago
- Warcbase is an open-source platform for managing analyzing web archives☆161Updated 6 years ago
- Fcrepo4 webapp plus optional fcrepo dependencies☆13Updated 4 years ago
- a CLI suggestion tool for Wikidata entities☆29Updated 8 years ago