agile-lab-dev / witboost-starter-kit
Witboost is a versatile platform that addresses a wide range of sophisticated data engineering challenges. The Starter Kit showcases the integration capabilities and provides a "batteries-included" product.
☆21Updated last week
Alternatives and similar repositories for witboost-starter-kit:
Users that are interested in witboost-starter-kit are comparing it to the libraries listed below
- Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.☆90Updated this week
- An open specification for data products in Data Mesh☆56Updated 5 months ago
- A Python package that creates fine-grained dbt tasks on Apache Airflow☆68Updated 7 months ago
- ☆38Updated 11 months ago
- Adapter for dbt that executes dbt pipelines on Apache Flink☆95Updated last year
- PDF DataSource for Apache Spark, allow to read PDF files directly to the DataFrame and ocr it☆49Updated last week
- Library to convert DBT manifest metadata to Airflow tasks☆48Updated last year
- A write-audit-publish implementation on a data lake without the JVM☆46Updated 8 months ago
- Sample configuration to deploy a modern data platform.☆88Updated 3 years ago
- Orchestrate Spark Jobs from Kubeflow Pipelines and poll for the status.☆53Updated 2 years ago
- Weekly Data Engineering Newsletter☆95Updated 9 months ago
- Delta lake and filesystem helper methods☆51Updated last year
- A curated list of awesome blogs, videos, tools and resources about Data Contracts☆173Updated 8 months ago
- Code snippets used in demos recorded for the blog.☆34Updated last week
- dbt-github-workflow is a boilerplate that contains all the necessary configurations to set up a simple CI/CD pipeline for your data model…☆17Updated 3 years ago
- Debussy is an opinionated Data Architecture and Engineering framework, enabling data analysts and engineers to build better platforms and…☆28Updated 2 years ago
- Support for generating modern platforms dynamically with services such as Kafka, Spark, Streamsets, HDFS, ....☆75Updated this week
- A library that brings useful functions from various modern database management systems to Apache Spark☆58Updated last year
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.☆29Updated this week
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…☆94Updated this week
- A dbt (data build tool) project you can use for testing purposes or experimentation☆36Updated last year
- Repo for everything open table formats (Iceberg, Hudi, Delta Lake) and the overall Lakehouse architecture☆60Updated 3 months ago
- A Table format agnostic data sharing framework☆38Updated last year
- A portable Datamart and Business Intelligence suite built with Docker, sqlmesh + dbtcore, DuckDB and Superset☆49Updated 5 months ago
- Delta reader for the Ray open-source toolkit for building ML applications☆45Updated last year
- Yet Another (Spark) ETL Framework☆20Updated last year
- A platform and cloud-based service for data sharing based on the Delta Sharing protocol.☆21Updated 10 months ago
- dbt's adapter for dremio☆48Updated 2 years ago
- A DataOps framework for building a lakehouse.☆50Updated last week
- DBT Package reproducing dbt incremental materialization leveraging on Snowflake streams☆31Updated 5 months ago