mozilla / docker-etl
Collection of dockerized ETL jobs managed by data engineering.
☆19Updated this week
Alternatives and similar repositories for docker-etl:
Users that are interested in docker-etl are comparing it to the libraries listed below
- End-to-end DataOps platform deployed by Terraform.☆65Updated 7 months ago
- A repository of sample code to show data quality checking best practices using Airflow.☆74Updated last year
- ETL jobs for Firefox Telemetry☆28Updated 5 months ago
- Astronomer Core Docker Images☆106Updated 8 months ago
- Documentation and implementation of telemetry ingestion on Google Cloud Platform☆82Updated this week
- This library has moved to https://github.com/googleapis/google-cloud-python/tree/main/packages/google-cloud-dataproc☆48Updated last year
- Sample Airflow DAGs☆62Updated 2 years ago
- LookML Generator for Glean and Mozilla Data☆19Updated last week
- A Python package to centralize some Google Cloud Data Catalog scripts, this repo contains commands like bulk CSV operations that help lev…☆21Updated 2 years ago
- Repo with scripts and automation to help ensure best practices in Google Data Catalog☆13Updated 3 years ago
- Sample code with integration between Data Catalog and Hive data source.☆25Updated 3 weeks ago
- Automatically discover and tag PII data across BigQuery tables and apply column-level access controls based on confidentiality level.☆49Updated last month
- Multi Cloud Data Tokenization Solution By Using Dataflow and Cloud DLP☆90Updated 6 months ago
- PySpark schema generator☆41Updated last year
- Pylint plugin for static code analysis on Airflow code☆93Updated 4 years ago
- Unity Catalog UI☆39Updated 5 months ago
- A Table format agnostic data sharing framework☆38Updated last year
- Make simple storing test results and visualisation of these in a BI dashboard☆41Updated last week
- This library has moved to https://github.com/googleapis/google-cloud-python/tree/main/packages/google-cloud-datacatalog☆52Updated last year
- Delta reader for the Ray open-source toolkit for building ML applications☆45Updated last year
- ☆19Updated 3 years ago
- Delta Lake helper methods. No Spark dependency.☆22Updated 5 months ago
- Data validation library for PySpark 3.0.0☆34Updated 2 years ago
- Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html☆61Updated 2 years ago
- This library has moved to https://github.com/googleapis/google-cloud-python/tree/main/packages/google-cloud-orchestration-airflow☆15Updated last year
- Great Expectations Airflow operator☆159Updated last week
- ☆46Updated 9 months ago
- Apache Airflow CI pipeline☆19Updated 5 years ago
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.☆28Updated this week
- Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service☆64Updated 9 months ago