Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)
☆66Sep 23, 2023Updated 2 years ago
Alternatives and similar repositories for lakehouse
Users that are interested in lakehouse are comparing it to the libraries listed below
Sorting:
- Proof of concept of a big data cluster using open source tools☆11Apr 10, 2024Updated last year
- Repository for Practical Data Pipeline Code☆11Feb 19, 2022Updated 4 years ago
- Gitbook Repo for Practical Data Pipeline☆25Feb 4, 2022Updated 4 years ago
- Run an open-source data LakeHouse locally using Docker Compose☆12May 31, 2024Updated last year
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆75Sep 2, 2023Updated 2 years ago
- Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi☆119Dec 15, 2023Updated 2 years ago
- dbt (data build tool) projects targeting AWS analytics services (redshift, glue, emr, athena) and open table formats☆31Apr 13, 2023Updated 2 years ago
- domain driven design in Go☆14Aug 18, 2020Updated 5 years ago
- A custom end-to-end analytics platform for customer churn☆11May 15, 2025Updated 9 months ago
- Inference API server with echo and gRPC to triton server (golang)☆13Nov 16, 2022Updated 3 years ago
- Unity Catalog Explorer is a TypeScript + Next.js based Web UI for the Unity Catalog OSS.☆13Jun 29, 2024Updated last year
- Elastic Stack Data Pipeline 구축 실습☆19Nov 20, 2021Updated 4 years ago
- A sample implementation of stream writes to an Iceberg table on GCS using Flink and reading it using Trino☆22May 30, 2022Updated 3 years ago
- Apache Atlas client☆18Updated this week
- Code to demonstrate data engineering metadata & logging best practices☆21Mar 12, 2024Updated last year
- Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work☆47Jul 13, 2022Updated 3 years ago
- ☆23Nov 17, 2022Updated 3 years ago
- 패스트캠퍼스, 파이썬을 이용한 머신러닝 입문 실습 코드☆21Sep 25, 2020Updated 5 years ago
- A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Doc…☆22Nov 19, 2024Updated last year
- Analytics engineering with dbt - projects and developer environment☆22Sep 27, 2024Updated last year
- Building a Modern Data Lake with Minio, Spark, Airflow via Docker.☆23May 11, 2024Updated last year
- A simple Data Engineering solution for testing or education purposes. You only need to know SQL and Python to understand this project. Da…☆28Jul 2, 2022Updated 3 years ago
- ☆25Mar 15, 2024Updated last year
- Build DataOps platform with Apache Airflow and dbt on AWS☆59Jun 1, 2021Updated 4 years ago
- Open source stack lakehouse☆25Mar 2, 2024Updated 2 years ago
- ☆25Jul 2, 2022Updated 3 years ago
- Low Cost, Simple and Scalable Way of Data Replication to Apache Iceberg/Cloud/Data Lake☆302Feb 23, 2026Updated last week
- Scalable Batch and Stream Data Processing☆29Aug 21, 2024Updated last year
- Kotlin helpers to make writing Flink jobs with Kotlin 27% more delightful☆35Dec 9, 2024Updated last year
- asw.cluster R package for calculating group faultlines☆12Aug 20, 2023Updated 2 years ago
- fine-tuning tutorial☆18Feb 20, 2026Updated 2 weeks ago
- Repository for the dbt Semantic Layer course☆12Updated this week
- Less-Resilient MapReduce framework for Go☆36Jan 17, 2024Updated 2 years ago
- LinkMind is an enterprise-level composite multimodal large model middleware.☆18Updated this week
- 爱影视分享平台是一个基本SpringBoot+Vue前后端分离的影视平台,整合了市面上现有的影视功能,个人添加了交友匹配功能,引入了爬虫豆瓣电影☆10May 31, 2023Updated 2 years ago
- Beyond Vibe Coding. Code, Planning, Documentation and Product Management agents.☆70Feb 20, 2026Updated 2 weeks ago
- ☆11Oct 1, 2025Updated 5 months ago
- DOS Program Development☆13Nov 9, 2022Updated 3 years ago
- User-friendly viewer for Parquet files☆10Jan 10, 2026Updated last month