gglanzani / hmsclient
☆20Updated 11 months ago
Related projects: ⓘ
- A Python client for Apache Livy, enabling use of remote Apache Spark clusters.☆70Updated 2 years ago
- The Workload Analyzer collects Presto® and Trino workload statistics, and analyzes them☆135Updated 10 months ago
- CLI tool to bulk migrate the tables from one catalog another without a data copy☆51Updated this week
- A tool and library for easily deploying applications on Apache YARN☆142Updated 6 months ago
- Spark metrics related custom classes and sinks (e.g. Prometheus)☆175Updated 2 years ago
- ☆77Updated last year
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆111Updated last month
- Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…☆82Updated 5 months ago
- Replicates any database (CDC events) to Apache Iceberg (To Cloud Storage)☆179Updated this week
- Spark-Radiant is Apache Spark Performance and Cost Optimizer☆25Updated last year
- A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Ap…☆296Updated 8 months ago
- Python client for Hadoop® YARN API☆109Updated last year
- Simple project to expose a catalog over REST using a Java catalog backend☆103Updated this week
- Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive☆183Updated last year
- Storage connector for Trino☆90Updated 3 weeks ago
- Spark ClickHouse Connector build on DataSourceV2 API☆181Updated this week
- Visualize column-level data lineage in Spark SQL☆85Updated 2 years ago
- Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.☆267Updated last month
- A re-implementation of Hadoop DistCP in Apache Spark☆42Updated 9 months ago
- Spark package for checking data quality☆222Updated 4 years ago
- A temporary home for LinkedIn's changes to Apache Iceberg (incubating)☆62Updated 4 months ago
- Pylint plugin for static code analysis on Airflow code☆89Updated 3 years ago
- Parcel for Apache Airflow☆17Updated 5 years ago
- A list of Presto/Trino resources☆22Updated last year
- Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.☆86Updated 6 months ago
- DynoYARN is a framework to run simulated YARN clusters and workloads for YARN scale testing.☆58Updated last year
- A S3 Shuffle plugin for Apache Spark to enable elastic scaling for generic Spark workloads.☆37Updated 4 months ago
- A process that runs in unison with Apache Airflow to control the Scheduler process to ensure High Availability☆232Updated 2 years ago
- A load balancer / proxy / gateway for prestodb☆356Updated last month
- Setup for running Trino with Hive Metastore on Kubernetes☆98Updated 2 years ago