indierambler / data-environment
Docker Compose environment for big data research and machine learning development
☆11Updated last year
Alternatives and similar repositories for data-environment:
Users that are interested in data-environment are comparing it to the libraries listed below
- Docker image that builds a patched Apache Spark with AWS Glue support as metastore☆15Updated 9 months ago
- ☆261Updated 5 months ago
- Custom PySpark Data Sources☆42Updated this week
- Delta Lake examples☆221Updated 5 months ago
- A Python Library to support running data quality rules while the spark job is running⚡☆180Updated 2 weeks ago
- A curated list of open source tools used in analytics platforms and data engineering ecosystem☆286Updated 3 weeks ago
- Delta Lake helper methods in PySpark☆322Updated 6 months ago
- dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks☆424Updated last month
- Local Environment to Practice Data Engineering☆143Updated 3 months ago
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆63Updated last year
- Pythonic Programming Framework to orchestrate jobs in Databricks Workflow☆212Updated last week
- A highly efficient daemon for streaming data from Kafka into Delta Lake☆393Updated 3 weeks ago
- Code samples, etc. for Databricks☆63Updated 2 weeks ago
- Delta-Lake, ETL, Spark, Airflow☆46Updated 2 years ago
- Apache PyIceberg☆657Updated this week
- Home of the Open Data Contract Standard (ODCS).☆468Updated this week
- Data product portal created by Dataminded☆180Updated this week
- Slow & local data allows you to move fast and deliver business value for the 99.9% of the data challenges.☆186Updated 2 weeks ago
- ☆34Updated 2 months ago
- 📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.☆41Updated 2 months ago
- Performance Observability for Apache Spark☆239Updated last week
- The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for sever…☆240Updated last month
- Step-by-step tutorial on building a Kimball dimensional model with dbt☆135Updated 8 months ago
- Spark style guide☆258Updated 6 months ago
- Docker with Airflow and Spark standalone cluster☆253Updated last year
- Repo for everything open table formats (Iceberg, Hudi, Delta Lake) and the overall Lakehouse architecture☆60Updated 2 months ago
- ☆28Updated this week
- Code for dbt tutorial☆155Updated 10 months ago
- Don't Panic. This guide will help you when it feels like the end of the world.☆23Updated 9 months ago
- Collection of dbt Tips and Tricks☆384Updated 2 years ago