chuqbach / Big-Data-InstallationLinks
The Complete Big Data Installation Solutions
☆16Updated 2 years ago
Alternatives and similar repositories for Big-Data-Installation
Users that are interested in Big-Data-Installation are comparing it to the libraries listed below
Sorting:
- Open source stack lakehouse☆25Updated last year
- My Setup Development Environment as Data Engineer☆35Updated 6 months ago
- End-to-end data platform: A PoC Data Platform project utilizing modern data stack (Spark, Airflow, DBT, Trino, Lightdash, Hive metastore,…☆47Updated last year
- Nyc_Taxi_Data_Pipeline - DE Project☆136Updated last year
- Local Environment to Practice Data Engineering☆144Updated last year
- ☆14Updated 2 years ago
- Docker with Airflow and Spark standalone cluster☆262Updated 2 years ago
- Simple stream processing pipeline☆110Updated last year
- Code snippets for Data Engineering Design Patterns book☆331Updated last month
- Repo for everything open table formats (Iceberg, Hudi, Delta Lake) and the overall Lakehouse architecture☆140Updated 3 weeks ago
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆75Updated 2 years ago
- ☆41Updated 3 years ago
- Simple repo to demonstrate how to submit a spark job to EMR from Airflow☆34Updated 5 years ago
- Just starting your DE journey or along the way already?. I will be sharing a short list of DATA-ENGINEERING-CENTRED books that covers the…☆34Updated 3 years ago
- Spark all the ETL Pipelines☆36Updated 2 years ago
- Data Engineering Handbook for beginners and everyone☆79Updated last year
- Data Engineering examples for Airflow, Prefect; dbt for BigQuery, Redshift, ClickHouse, Postgres, DuckDB; PySpark for Batch processing; K…☆69Updated last week
- My first attempt at a rough ETL pipeline; technologies include spark, GCS, prefect orchestration, and terraform☆14Updated 3 years ago
- Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data☆49Updated 2 years ago
- End to end data engineering project☆58Updated 3 years ago
- ☆270Updated last year
- An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Ka…☆312Updated 11 months ago
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)☆64Updated 2 years ago
- End to end data engineering project with kafka, airflow, spark, postgres and docker.☆108Updated last month
- O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian☆228Updated 2 years ago
- Udacity Data Engineering Nano Degree (DEND)☆189Updated 6 years ago
- ☆16Updated last year
- Code for dbt tutorial☆168Updated 5 months ago
- ☆59Updated 2 years ago
- Docker Airflow - Contains a docker compose file for Airflow 2.0☆70Updated 3 years ago