chrismattmann / etllib
This is the ETL lib package. It provides an API to munge and prepare JSON, TSV and other data using Apache Tika and JSON parsing/loading for ETL via Apache OODT (or other libs) into Apache Solr.
☆17Updated last year
Alternatives and similar repositories for etllib:
Users that are interested in etllib are comparing it to the libraries listed below
- Uses Apache Lucene, OpenNLP and geonames and extracts locations from text and geocodes them.☆37Updated last year
- All that entity matching, resolution, normalization, enhancement and reconciliation madness, but with a focus on data, not platforms.☆24Updated 3 years ago
- A system to generate SPARQL queries from natural language queries.☆30Updated 2 months ago
- The OpenSextant Gazetteer is a collection of world-wide place name data☆12Updated 7 years ago
- BatchRefine adds batch processing capabilities to OpenRefine☆50Updated 8 years ago
- Big GeoSpatial Data Points Visualization Tool☆19Updated 9 years ago
- Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser☆13Updated 3 years ago
- A framework to allow the matching of string entities using customised sets of transformations and matchers, plus a tool to produce the ne…☆31Updated 8 years ago
- Vizlinc☆14Updated 9 years ago
- Geographic Place, Date/time, and Pattern entity extraction toolkit along with text extraction from unstructured data and GIS outputters.☆44Updated 3 months ago
- Linked Data explorer and SPARQL endpoint☆23Updated 3 years ago
- Google Refine extension for adding columns (extending data) from DBpedia☆39Updated 11 years ago
- Tool to cleanse and semantify datasets from CKAN repositories. Based on OpenRefine.☆23Updated 9 years ago
- Semanticizest: dump parser and client☆20Updated 8 years ago
- Nutch-Python is a Python binding to the Apache Nutch™ REST services allowing Nutch to be called natively in the Python community. — Edit☆39Updated 9 years ago
- A toolkit for clustering web pages based on various similarity measures.☆33Updated 3 years ago
- This is a REST Server endpoint built using Flask and Python.☆24Updated 2 years ago
- For interacting with nutch via Python☆28Updated 2 weeks ago
- An experiment in visualizing your Solr index via term counts, document counts, and memory usage per field and data type.☆15Updated 10 years ago
- Advanced desktop search/corpus exploration prototype☆21Updated 3 years ago
- A high-throughput ontology-based pipeline for data integration☆14Updated last year
- An HTTP proxy for Elasticsearch, Solr (etc.) to prevent a 100% full disk situation.☆11Updated 6 years ago
- Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.☆34Updated 2 years ago
- An RDF Search Engine☆57Updated 7 years ago
- A scalable Apache Hadoop-based implementation of the Pooled Time Series video similarity algorithm based on M. Ryoo et al paper CVPR 2015…☆10Updated 7 years ago
- D2Refine - A Metadata Harmonization and Validation Workbench☆17Updated 6 years ago
- Deprecated Module: See Xponents or OpenSextantToolbox as active code base.☆31Updated 11 years ago
- Execute OpenRefine JSON scripts without OpenRefine (or Java)☆30Updated 2 years ago
- Docker container to provide Apache Tika RESTful API☆41Updated 9 years ago
- LINKED DATA QUALITY REPORTS☆41Updated 2 years ago