tabular-io / docker-spark-iceberg
☆232Updated this week
Related projects: ⓘ
- The Trino (https://trino.io/) adapter plugin for dbt (https://getdbt.com)☆207Updated last week
- ☆248Updated last week
- Simple project to expose a catalog over REST using a Java catalog backend☆102Updated this week
- ☆197Updated last month
- Adapter for dbt that executes dbt pipelines on Apache Flink☆80Updated 6 months ago
- Apache Hive Metastore as a Standalone server in Docker☆64Updated 3 weeks ago
- Example for article Running Spark 3 with standalone Hive Metastore 3.0☆96Updated last year
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆111Updated last month
- Replicates any database (CDC events) to Apache Iceberg (To Cloud Storage)☆179Updated this week
- dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks☆390Updated this week
- ☆77Updated last year
- A library that provides useful extensions to Apache Spark and PySpark.☆193Updated this week
- Quick Guides from Dremio on Several topics☆60Updated 2 weeks ago
- A Python Library to support running data quality rules while the spark job is running⚡☆161Updated last month
- Repository of helm charts for deploying DataHub on a Kubernetes cluster☆160Updated 2 weeks ago
- Performance Observability for Apache Spark☆163Updated last week
- Apache PyIceberg☆385Updated this week
- Delta Lake examples☆201Updated 3 months ago
- Trino dbt demo project to mix and load BigQuery data with and in a local PostgreSQL database☆64Updated 3 years ago
- REST API for Apache Spark on K8S or YARN☆89Updated last week
- A dbt adapter for Databricks.☆211Updated this week
- Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used …☆309Updated last week
- The Workload Analyzer collects Presto® and Trino workload statistics, and analyzes them☆135Updated 10 months ago
- Delta Lake helper methods in PySpark☆294Updated 2 weeks ago
- Pythonic Programming Framework to orchestrate jobs in Databricks Workflow☆185Updated this week
- Official Dockerfile for Apache Spark☆95Updated last month
- Python client for Trino☆322Updated 2 weeks ago
- Snowflake Data Source for Apache Spark.☆213Updated this week
- Storage connector for Trino☆90Updated 2 weeks ago
- A highly efficient daemon for streaming data from Kafka into Delta Lake☆354Updated last week