adidas / m3d-apiLinks
Metadata Driven Development (m3d) is a cloud and platform agnostic framework for the automated creation, management and governance of data lakes.
☆31Updated 2 years ago
Alternatives and similar repositories for m3d-api
Users that are interested in m3d-api are comparing it to the libraries listed below
Sorting:
- M3D Engine is a Spark application for the development of scalable data transformations and ingestions in data lakes.☆18Updated 4 years ago
- Data Profiler for AWS Glue Data Catalog application as described in the AWS Big Data Blog post "Build an automatic data profiling and rep…☆20Updated 5 years ago
- A curated list of awesome Databricks resources, including Spark☆19Updated 11 months ago
- ☆11Updated 5 years ago
- ☆96Updated last year
- This repository has a collection of utilities for Glue Crawlers. These utilities come in the form of AWS CloudFormation templates or AWS …☆19Updated 3 years ago
- Curated list of resources about Apache Airflow☆19Updated 4 years ago
- Yet Another (Spark) ETL Framework☆21Updated last year
- A K8s-based infrastructure for analytics☆24Updated 5 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆55Updated 2 years ago
- Spark app to merge different schemas☆23Updated 4 years ago
- Metamapper is a data discovery and documentation platform for improving how teams understand and interact with their data.☆79Updated 3 weeks ago
- Data Brewery is an ETL (Extract-Transform-Load) program that connect to many data sources (cloud services, databases, ...) and manage dat…☆16Updated 4 years ago
- Hadoop/Hive/Spark container to perform CI tests☆11Updated 4 years ago
- DataHub on AWS demonstration resources☆10Updated 2 years ago
- Public source code for the Batch Processing with Apache Beam (Python) online course☆18Updated 4 years ago
- Universal interface for data services☆16Updated 2 years ago
- Sample configuration to deploy a modern data platform.☆88Updated 3 years ago
- 💻 CLI for reporting events to Faros platform☆14Updated 3 weeks ago
- Awesome list of dataops products, open source and resources☆24Updated 3 years ago
- Pipeline library for StreamSets Data Collector and Transformer☆33Updated 2 years ago
- An Ansible collection for lifecycle and management of Cloudera CDP Private Cloud resources on bare metal, IaaS, and PaaS.☆34Updated 2 weeks ago
- PySpark for ETL jobs including lineage to Apache Atlas in one script via code inspection☆18Updated 8 years ago
- Delta Lake Documentation☆49Updated 11 months ago
- Ansible roles to deploy Kubernetes, JupyterHub, Jupyter Enterprise Gateway and Spark on Kubernetes cluster☆39Updated 4 years ago
- Awesome List for Data Operations☆24Updated 4 years ago
- event-triggered plugins for airflow☆21Updated 5 years ago
- Build and run Spark Structured Streaming pipelines in Hadoop - project using PySpark.☆13Updated 6 years ago
- Build DataOps platform with Apache Airflow and dbt on AWS☆55Updated 4 years ago
- The sane way of building a data layer in Airflow☆24Updated 5 years ago