amesar / hive-json-schema-genLinks
Generates Hive schema from JSON
☆15Updated 9 years ago
Alternatives and similar repositories for hive-json-schema-gen
Users that are interested in hive-json-schema-gen are comparing it to the libraries listed below
Sorting:
- Export Airflow metrics (from mysql) in prometheus format☆29Updated 9 months ago
- The sane way of building a data layer in Airflow☆24Updated 6 years ago
- A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…☆52Updated 7 months ago
- Oozie Workflow to Airflow DAGs migration tool☆90Updated last month
- Rokku project. This project acts as a proxy on top of any S3 storage solution providing services like authentication, authorization, shor…☆70Updated 5 months ago
- DynoYARN is a framework to run simulated YARN clusters and workloads for YARN scale testing.☆60Updated 2 years ago
- Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful …☆144Updated last year
- A library for Spark DataFrame using MinIO Select API☆99Updated 6 years ago
- Airflow declarative DAGs via YAML☆133Updated 2 years ago
- An Operator for scheduling and executing NiFi Flows as Jobs on Kubernetes☆53Updated 5 years ago
- Pylint plugin for static code analysis on Airflow code☆97Updated 5 years ago
- Apiary provides modules which can be combined to create a federated cloud data lake☆37Updated last year
- Presto and Minio on Docker Infrastructure☆43Updated 7 years ago
- Lightweight proxy to expose the UI of an Apache Spark cluster that is behind a firewall☆98Updated 5 years ago
- ☆10Updated 3 years ago
- ETLy is an add-on dashboard service on top of Apache Airflow.☆68Updated 2 years ago
- A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operator☆73Updated 6 years ago
- JSON schema parser for Apache Spark☆82Updated 3 years ago
- ☆39Updated 6 years ago
- An extension for Jupyter Lab & Jupyter Notebook to monitor Apache Spark (pyspark) from notebooks☆55Updated 2 weeks ago
- ☆108Updated 3 years ago
- Fast iterative local development and testing of Apache Airflow workflows☆204Updated last month
- DBND is an agile pipeline framework that helps data engineering teams track and orchestrate their data processes.☆268Updated 10 months ago
- Hadoop Yarn aggregated log parser utility☆23Updated 6 years ago
- Continuously synchronize directories from remote object store to local filesystem☆109Updated last week
- ☆37Updated 6 years ago
- Convert a CSV fle to ORCFile☆26Updated 6 years ago
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.☆30Updated last week
- ☆108Updated 2 years ago
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆91Updated last year