mozilla / docker-etlLinks
Collection of dockerized ETL jobs managed by data engineering.
☆21Updated last week
Alternatives and similar repositories for docker-etl
Users that are interested in docker-etl are comparing it to the libraries listed below
Sorting:
- Learn how to add data validation and documentation to a data pipeline built with dbt and Airflow.☆168Updated 2 years ago
- Build your feature store with macros right within your dbt repository☆39Updated 3 years ago
- Any Airflow project day 1, you can spin up a local desktop Kubernetes Airflow environment AND one in Google Cloud Composer with tested da…☆113Updated 2 years ago
- A GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.☆83Updated last year
- Utility functions for dbt projects running on Spark☆34Updated last month
- locopy: Loading/Unloading to Redshift and Snowflake using Python.☆115Updated last week
- Weekly Data Engineering Newsletter☆96Updated last year
- ☆23Updated 4 years ago
- Mapping of DWH database tables to business entities, attributes & metrics in Python, with automatic creation of flattened tables☆75Updated 2 years ago
- Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes☆63Updated 3 years ago
- Make simple storing test results and visualisation of these in a BI dashboard☆52Updated last month
- The go to demo for public and private dbt Learn☆82Updated 10 months ago
- A repository of sample code to show data quality checking best practices using Airflow.☆78Updated 2 years ago
- Great Expectations Airflow operator☆170Updated last week
- Data-aware orchestration with dagster, dbt, and airbyte☆31Updated 3 years ago
- Full stack data engineering tools and infrastructure set-up☆57Updated 4 years ago
- A Python framework for data processing on GCP.☆120Updated 9 months ago
- Public source code for the Batch Processing with Apache Beam (Python) online course☆18Updated 5 years ago
- End-to-end DataOps platform deployed by Terraform.☆69Updated 10 months ago
- The Picnic Data Vault framework.☆129Updated 3 weeks ago
- Automatically discover and tag PII data across BigQuery tables and apply column-level access controls based on confidentiality level.☆61Updated last month
- [ARCHIVED] The Presto adapter plugin for dbt Core☆32Updated 2 years ago
- Supporting materials/code examples for my course in data engineering for machine learning.☆39Updated 3 years ago
- A DBT package to perform DataOps & administrative CI/CD on your data warehouse.☆16Updated 4 years ago
- Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html☆62Updated 3 years ago
- A bunch of hacks developed around dbt☆48Updated 6 years ago
- Sample configuration to deploy a modern data platform.☆89Updated 4 years ago
- Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service☆76Updated last year
- Rules based grant management for Snowflake☆41Updated 7 years ago
- Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.☆127Updated 4 years ago