deordie / deordie-digestLinks
Data Engineering Digest
☆28Updated 11 months ago
Alternatives and similar repositories for deordie-digest
Users that are interested in deordie-digest are comparing it to the libraries listed below
Sorting:
- Code for Apache Hudi, Apache Iceberg and Delta Lake analysis☆9Updated last year
- Airflow declarative DAGs via YAML☆132Updated last year
- Yet Another (Spark) ETL Framework☆21Updated last year
- Command-line interface to quickly generate fake CSV and JSON data☆73Updated 11 months ago
- Enforce Best Practices for all your Airflow DAGs. ⭐☆101Updated last week
- ☆18Updated 3 years ago
- Repo for everything open table formats (Iceberg, Hudi, Delta Lake) and the overall Lakehouse architecture☆83Updated this week
- Flowchart for debugging Spark applications☆105Updated 8 months ago
- CLI tool to bulk migrate the tables from one catalog another without a data copy☆79Updated 2 months ago
- A Table format agnostic data sharing framework☆38Updated last year
- dbt module for myBI connect☆12Updated 2 years ago
- Data Tools Subjective List☆83Updated last year
- Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes☆64Updated 3 years ago
- Data validation library for PySpark 3.0.0☆33Updated 2 years ago
- ☆89Updated 5 months ago
- ☆16Updated this week
- An open specification for data products in Data Mesh☆60Updated 7 months ago
- Spark style guide☆259Updated 8 months ago
- ☆58Updated 10 months ago
- A repository of sample code to show data quality checking best practices using Airflow.☆77Updated 2 years ago
- A tool to validate data, built around Apache Spark.☆101Updated last month
- The go to demo for public and private dbt Learn☆77Updated 2 months ago
- Magic to help Spark pipelines upgrade☆35Updated 8 months ago
- Delta reader for the Ray open-source toolkit for building ML applications☆46Updated last year
- The Internals of PySpark☆26Updated 5 months ago
- A Python package that creates fine-grained dbt tasks on Apache Airflow☆70Updated 8 months ago
- Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html☆61Updated 2 years ago
- A DuckDB-powered command line interface for Snowflake security, governance, operations, and cost optimization.☆40Updated 10 months ago
- Superglue is a lineage-tracking tool built to help visualize the propagation of data through complex pipelines composed of tables, jobs …☆157Updated 2 years ago
- Sample Airflow DAGs☆62Updated 2 years ago