mozilla / docker-etlLinks
Collection of dockerized ETL jobs managed by data engineering.
☆20Updated last week
Alternatives and similar repositories for docker-etl
Users that are interested in docker-etl are comparing it to the libraries listed below
Sorting:
- Pythonic Programming Framework to orchestrate jobs in Databricks Workflow☆222Updated 2 weeks ago
- A repository of sample code to show data quality checking best practices using Airflow.☆78Updated 2 years ago
- Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes☆64Updated 3 years ago
- End-to-end DataOps platform deployed by Terraform.☆69Updated 8 months ago
- locopy: Loading/Unloading to Redshift and Snowflake using Python.☆114Updated 4 months ago
- PySpark schema generator☆43Updated 2 years ago
- Delta Lake helper methods. No Spark dependency.☆23Updated last year
- Making DAG construction easier☆280Updated 2 months ago
- Delta Lake examples☆234Updated last year
- Weekly Data Engineering Newsletter☆96Updated last year
- Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html☆61Updated 3 years ago
- Bigquery ETL☆322Updated this week
- ☆54Updated 10 months ago
- A Table format agnostic data sharing framework☆42Updated last year
- Big Data Demystified meetup and blog examples☆31Updated last year
- [ARCHIVED] The Presto adapter plugin for dbt Core☆33Updated last year
- Data-aware orchestration with dagster, dbt, and airbyte☆30Updated 2 years ago
- Great Expectations Airflow operator☆169Updated last week
- Utility functions for dbt projects running on Spark☆33Updated last month
- Full stack data engineering tools and infrastructure set-up☆57Updated 4 years ago
- Read Delta tables without any Spark☆47Updated last year
- Automatically discover and tag PII data across BigQuery tables and apply column-level access controls based on confidentiality level.☆61Updated this week
- Pylint plugin for static code analysis on Airflow code☆96Updated 5 years ago
- Superglue is a lineage-tracking tool built to help visualize the propagation of data through complex pipelines composed of tables, jobs …☆159Updated 2 years ago
- A simple and easy to use Data Quality (DQ) tool built with Python.☆50Updated 2 years ago
- Airflow configuration for Telemetry☆197Updated this week
- PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows☆44Updated last month
- Astronomer Core Docker Images☆106Updated last year
- Unity Catalog UI☆43Updated last year
- Support for generating modern platforms dynamically with services such as Kafka, Spark, Streamsets, HDFS, ....☆77Updated this week