amesar / hive-json-schema-gen
Generates Hive schema from JSON
☆15Updated 8 years ago
Related projects ⓘ
Alternatives and complementary repositories for hive-json-schema-gen
- An Operator for scheduling and executing NiFi Flows as Jobs on Kubernetes☆53Updated 4 years ago
- A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations o…☆48Updated 10 months ago
- A temporary home for LinkedIn's changes to Apache Iceberg (incubating)☆62Updated 6 months ago
- A dbt adapter for Decodable☆11Updated 10 months ago
- Export Airflow metrics (from mysql) in prometheus format☆29Updated 2 years ago
- A library for Spark DataFrame using MinIO Select API☆96Updated 5 years ago
- Receipes of publicly-available Jupyter images☆8Updated last month
- The sane way of building a data layer in Airflow☆24Updated 4 years ago
- Data Sketches for Apache Spark☆21Updated last year
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆86Updated 8 months ago
- hive_compared_bq compares/validates 2 (SQL like) tables, and graphically shows the rows/columns that are different.☆28Updated 6 years ago
- Demonstration of a Hive Input Format for Iceberg☆26Updated 3 years ago
- ☆77Updated last year
- Collection of HDP Tuning Tricks & Tips (unofficial guide)☆17Updated 7 years ago
- Graph Analytics with Apache Kafka☆101Updated last week
- Dione - a Spark and HDFS indexing library☆50Updated 8 months ago
- A library on top of either pex or conda-pack to make your Python code easily available on a cluster☆45Updated this week
- CLI tool to bulk migrate the tables from one catalog another without a data copy☆61Updated this week
- Oozie Workflow to Airflow DAGs migration tool☆87Updated 3 weeks ago
- ☆28Updated last year
- Yet Another Spark SQL JDBC/ODBC server based on the PostgreSQL V3 protocol☆34Updated 2 years ago
- Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.☆75Updated 7 months ago
- Spark cloud integration: tests, cloud committers and more☆19Updated 8 months ago
- Instant access to the Spark cluster from anywhere☆16Updated 4 years ago
- Convert a CSV fle to ORCFile☆26Updated 5 years ago
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…☆92Updated last month
- Hadoop Yarn aggregated log parser utility☆23Updated 4 years ago
- A testing framework for Trino☆26Updated this week
- Quark is a data virtualization engine over analytic databases.☆99Updated 7 years ago