chrismattmann / etllibView external linksLinks
This is the ETL lib package. It provides an API to munge and prepare JSON, TSV and other data using Apache Tika and JSON parsing/loading for ETL via Apache OODT (or other libs) into Apache Solr.
☆18Jan 27, 2024Updated 2 years ago
Alternatives and similar repositories for etllib
Users that are interested in etllib are comparing it to the libraries listed below
Sorting:
- A DropWizard wrapper around Apache Tika.☆10Dec 22, 2016Updated 9 years ago
- Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.☆108Apr 9, 2025Updated 10 months ago
- Uses Apache Lucene, OpenNLP and geonames and extracts locations from text and geocodes them.☆38Apr 9, 2024Updated last year
- This is a Fact based Question Answering System using Apache Solr as backend search engine, Wikipedia dumps as information source, Apache …☆26Jan 21, 2026Updated 3 weeks ago
- Multi-step AI agents powered by Gemini 2.0 and the LangGraph framework. These agents orchestrate complex workflows and enhance their reas…☆10Dec 19, 2024Updated last year
- Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.☆34May 3, 2023Updated 2 years ago
- A toolkit for clustering web pages based on various similarity measures.☆34Oct 27, 2021Updated 4 years ago
- Generative and Parametric design code: featuring Processing / Python / Javascript / HTML / CSS☆14Nov 4, 2020Updated 5 years ago
- Overcooked! 2 TAS Development Framework☆10Aug 18, 2023Updated 2 years ago
- Description des formats de fichier☆11Feb 4, 2022Updated 4 years ago
- ☆10Apr 20, 2023Updated 2 years ago
- ☆12Nov 21, 2025Updated 2 months ago
- Exploratory Data Analysis of Time Series Data and Forecasting using Naïve Approach, Moving Average Method, Simple Exponential Smoothenin…☆12Jul 2, 2018Updated 7 years ago
- Provides fully configure Visual Studio Solution for ORTools☆10Aug 30, 2019Updated 6 years ago
- Watsonx Assistant with Milvus as Vector Database☆12Mar 31, 2025Updated 10 months ago
- World Model for Natural Gas Trade☆10Feb 8, 2018Updated 8 years ago
- An adaptive user interface for the Deriva platform.☆10Updated this week
- OSS2017 - Open Science for Synthesis: Gulf Research Program☆10May 12, 2019Updated 6 years ago
- Java command line tool to convert PAGE XML files with layout and text content to PDF☆10Apr 27, 2020Updated 5 years ago
- Code for preservation simulation/modeling project☆10Aug 24, 2021Updated 4 years ago
- R script for visualising patient ward movements as timelines☆13May 13, 2022Updated 3 years ago
- ☆14Jan 3, 2024Updated 2 years ago
- A fork of the disktype disk and disk image format detection tool☆11Nov 16, 2016Updated 9 years ago
- SIARD (Software Independent Archiving of Relational Databases) - an open file format for the long-term archiving of relational databases☆12Nov 14, 2024Updated last year
- Application which supports the UNC Libraries' Digital Collections Repository☆12Updated this week
- An example repo for implementing Segment's Javascript source through React☆12Mar 30, 2024Updated last year
- A collection of some awesome public projects about LLM-based Web Agents and Tools.☆12Apr 25, 2024Updated last year
- Hadoop-based tool for extraction of large scale synchronous grammars for paraphrasing and machine translation☆15Dec 2, 2016Updated 9 years ago
- A data management platform for the web☆11Feb 2, 2026Updated last week
- # Supporting-Emergency-Room-Decision-Making-with-Relevant-Scientific-Literature #### Supervised by: Yassine Benajiba #### Course: Introdu…☆10Jan 19, 2018Updated 8 years ago
- This repository contains examples of XML and XSLT files that can be used to control adding/viewing/editing/indexing of metadata in Preser…☆10Jan 8, 2019Updated 7 years ago
- Reddit Data Science Project Ideas☆11Dec 28, 2019Updated 6 years ago
- Nutch-Python is a Python binding to the Apache Nutch™ REST services allowing Nutch to be called natively in the Python community. — Edit☆39Apr 15, 2016Updated 9 years ago
- ☆11Jul 18, 2016Updated 9 years ago
- Digital preservation policies and strategies☆12Mar 29, 2024Updated last year
- MeMAD multimodal content analysis and machine translation: collection of tools and libraries☆12May 17, 2021Updated 4 years ago
- 📕Ansible playbooks for Raspberry Pi, Linux and Mac☆14Dec 22, 2024Updated last year
- PERICLES Extraction Tool☆17May 12, 2017Updated 8 years ago
- Celery plugin to autoscale based on available CPU, memory, or other system attributes.☆11Dec 8, 2017Updated 8 years ago