hussein-awala / spark-on-k8sLinks
A Python package to submit and manage Apache Spark applications on Kubernetes.
☆42Updated 3 weeks ago
Alternatives and similar repositories for spark-on-k8s
Users that are interested in spark-on-k8s are comparing it to the libraries listed below
Sorting:
- A Python Library to support running data quality rules while the spark job is running⚡☆189Updated this week
- A Python package that creates fine-grained dbt tasks on Apache Airflow☆70Updated 2 weeks ago
- Adapter for dbt that executes dbt pipelines on Apache Flink☆95Updated last year
- The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for sever…☆260Updated last month
- Drop-in replacement for Apache Spark UI☆297Updated last week
- Pythonic Programming Framework to orchestrate jobs in Databricks Workflow☆218Updated last month
- ☆267Updated 10 months ago
- Delta Lake examples☆226Updated 10 months ago
- New Generation Opensource Data Stack Demo☆441Updated 2 years ago
- The Trino (https://trino.io/) adapter plugin for dbt (https://getdbt.com)☆246Updated last week
- A Micosoft Power BI Custom Connector allowing you to import Trino data into Power BI.☆73Updated 7 months ago
- Airflow Providers containing Deferrable Operators & Sensors from Astronomer☆149Updated this week
- A CLI tool to streamline getting started with Apache Airflow™ and managing multiple Airflow projects☆219Updated 4 months ago
- Grafana dashboards and StatsD exporter config for Airflow monitoring☆284Updated last year
- Delta Lake helper methods in PySpark☆325Updated 11 months ago
- Repository of helm charts for deploying DataHub on a Kubernetes cluster☆194Updated last week
- Apache Hive Metastore as a Standalone server in Docker☆79Updated last year
- Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.☆375Updated 3 months ago
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)☆60Updated last year
- dbt + Trino demo project, using TPC-H sample data☆19Updated last year
- A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.☆207Updated last week
- Python client for Trino☆392Updated last week
- Open source stack lakehouse☆26Updated last year
- dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks☆441Updated last month
- Apache Spark Kubernetes Operator☆206Updated this week
- Trino dbt demo project to mix and load BigQuery data with and in a local PostgreSQL database☆76Updated 3 years ago
- Open Control Plane for Tables in Data Lakehouse☆369Updated this week
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆127Updated this week
- Data product portal created by Dataminded☆190Updated this week
- A curated list of awesome blogs, videos, tools and resources about Data Contracts☆178Updated last year