indierambler / data-environmentLinks
Docker Compose environment for big data research and machine learning development
☆11Updated last year
Alternatives and similar repositories for data-environment
Users that are interested in data-environment are comparing it to the libraries listed below
Sorting:
- Docker image that builds a patched Apache Spark with AWS Glue support as metastore☆17Updated last year
- Run your dbt Core or dbt Fusion projects as Apache Airflow DAGs and Task Groups with a few lines of code☆1,131Updated this week
- Custom PySpark Data Sources☆85Updated last week
- pyspark methods to enhance developer productivity 📣 👯 🎉☆682Updated 11 months ago
- This extension makes vscode seamlessly work with dbt™: Auto-complete, preview, column lineage, AI docs generation, health checks, cost es…☆560Updated last week
- Apache Airflow integration for dbt☆411Updated last year
- End-to-end data platform: A PoC Data Platform project utilizing modern data stack (Spark, Airflow, DBT, Trino, Lightdash, Hive metastore,…☆47Updated last year
- New Generation Opensource Data Stack Demo☆454Updated 3 years ago
- Spark style guide☆271Updated last year
- Example repository showing how to build a data platform with Prefect, dbt and Snowflake☆109Updated 3 years ago
- BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.☆420Updated last week
- PySpark test helper methods with beautiful error messages☆752Updated 3 weeks ago
- Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker.☆506Updated 3 months ago
- Construct Apache Airflow DAGs Declaratively via YAML configuration files☆1,415Updated this week
- A production-ready PySpark project template with medallion architecture, Python packaging, unit tests, integration tests, CI/CD automatio…☆48Updated this week
- Template for a data contract used in a data mesh.☆486Updated last year
- ☆270Updated last year
- Scalable and efficient data transformation framework - backwards compatible with dbt.☆2,876Updated this week
- This repository has moved into https://github.com/dbt-labs/dbt-adapters☆443Updated 6 months ago
- 📙 Awesome Data Catalogs and Observability Platforms.☆987Updated 5 months ago
- Port(ish) of Great Expectations to dbt test macros☆1,204Updated last year
- A curated list of awesome dbt resources☆1,638Updated this week
- Delta Lake helper methods in PySpark☆327Updated 3 weeks ago
- Drop-in replacement for Apache Spark UI☆401Updated this week
- A self-contained dbt project for testing purposes☆517Updated last year
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)☆64Updated 2 years ago
- A highly efficient daemon for streaming data from Kafka into Delta Lake☆427Updated 9 months ago
- A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineer…☆575Updated this week
- A curated list of data engineering tools for software developers☆502Updated 8 years ago
- The next-generation engine for dbt☆621Updated this week