chuqbach / Big-Data-InstallationLinks
The Complete Big Data Installation Solutions
☆16Updated 2 years ago
Alternatives and similar repositories for Big-Data-Installation
Users that are interested in Big-Data-Installation are comparing it to the libraries listed below
Sorting:
- Open source stack lakehouse☆25Updated last year
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆75Updated 2 years ago
- Docker with Airflow and Spark standalone cluster☆262Updated 2 years ago
- ☆270Updated last year
- ☆41Updated 3 years ago
- New Generation Opensource Data Stack Demo☆454Updated 3 years ago
- ☆14Updated 2 years ago
- ☆16Updated last year
- My Setup Development Environment as Data Engineer☆35Updated 6 months ago
- Local Environment to Practice Data Engineering☆144Updated last year
- Simple stream processing pipeline☆110Updated last year
- End-to-end data platform: A PoC Data Platform project utilizing modern data stack (Spark, Airflow, DBT, Trino, Lightdash, Hive metastore,…☆47Updated last year
- A Micosoft Power BI Custom Connector allowing you to import Trino data into Power BI.☆88Updated last year
- Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker.☆506Updated 3 months ago
- Drop-in replacement for Apache Spark UI☆401Updated this week
- Code for dbt tutorial☆168Updated 5 months ago
- ☆64Updated 4 years ago
- Building a Data Pipeline with an Open Source Stack☆55Updated 7 months ago
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)☆64Updated 2 years ago
- Delta Lake helper methods in PySpark☆327Updated 3 weeks ago
- Delta Lake examples☆238Updated last year
- Delta-Lake, ETL, Spark, Airflow☆48Updated 3 years ago
- Repo for everything open table formats (Iceberg, Hudi, Delta Lake) and the overall Lakehouse architecture☆140Updated 3 weeks ago
- ☆26Updated 2 years ago
- Code snippets for Data Engineering Design Patterns book☆331Updated last month
- A Python Library to support running data quality rules while the spark job is running⚡☆197Updated this week
- Template for Data Engineering and Data Pipeline projects☆116Updated 3 years ago
- A self-contained, ready to run Airflow ELT project. Can be run locally or within codespaces.☆80Updated 2 years ago
- Building a Modern Data Lake with Minio, Spark, Airflow via Docker.☆23Updated last year
- Tutorial for setting up a Spark cluster running inside of Docker containers located on different machines☆135Updated 3 years ago