getindata / doge-datagen
☆19Updated 2 years ago
Alternatives and similar repositories for doge-datagen:
Users that are interested in doge-datagen are comparing it to the libraries listed below
- Adapter for dbt that executes dbt pipelines on Apache Flink☆95Updated last year
- Library to convert DBT manifest metadata to Airflow tasks☆48Updated last year
- dbt's adapter for dremio☆48Updated 2 years ago
- Kafka Connector for Iceberg tables☆16Updated last year
- CLI tool to bulk migrate the tables from one catalog another without a data copy☆77Updated 3 weeks ago
- A Table format agnostic data sharing framework☆38Updated last year
- dbt + Trino demo project, using TPC-H sample data☆19Updated last year
- Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.☆108Updated this week
- Code snippets used in demos recorded for the blog.☆37Updated last week
- ☆80Updated 2 weeks ago
- Yet Another (Spark) ETL Framework☆21Updated last year
- ☆20Updated last year
- The Internals of Spark on Kubernetes☆71Updated 3 years ago
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆123Updated this week
- Trino dbt demo project to mix and load BigQuery data with and in a local PostgreSQL database☆75Updated 3 years ago
- Data validation library for PySpark 3.0.0☆33Updated 2 years ago
- Utility functions for dbt projects running on Trino☆21Updated last year
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.☆29Updated last week
- ☆53Updated 9 months ago
- ☆24Updated 8 months ago
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…☆94Updated 2 weeks ago
- Docker envinroment to stream data from Kafka to Iceberg tables☆28Updated last year
- The Workload Analyzer collects Presto® and Trino workload statistics, and analyzes them☆135Updated last year
- Multi-hop declarative data pipelines☆115Updated this week
- Magic to help Spark pipelines upgrade☆35Updated 7 months ago
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)☆56Updated last year
- Sparglim✨ makes PySpark App Configurable and Deploy Spark Connect Server Easier!☆37Updated 2 months ago
- ☆63Updated 5 years ago
- Apache Hive Metastore as a Standalone server in Docker☆74Updated 8 months ago
- A sample implementation of stream writes to an Iceberg table on GCS using Flink and reading it using Trino☆19Updated 2 years ago