Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data
☆51Dec 2, 2023Updated 2 years ago
Alternatives and similar repositories for Building-Data-LakeHouse
Users that are interested in Building-Data-LakeHouse are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize a…☆41Dec 15, 2025Updated 6 months ago
- This project implements an ELT (Extract - Load - Transform) data pipeline with the goodreads dataset, using dagster (orchestration), spar…☆44Apr 22, 2023Updated 3 years ago
- velib-v2: An ETL pipeline that employs batch and streaming jobs using Spark, Kafka, Airflow, and other tools, all orchestrated with Docke…☆21Aug 12, 2025Updated 10 months ago
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆78Sep 2, 2023Updated 2 years ago
- ☆13Oct 4, 2023Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenA…☆45Jan 4, 2024Updated 2 years ago
- NoSQL extract, transform, load (ETL) toolkit with Python☆16Jun 11, 2026Updated last week
- ☆25Mar 15, 2024Updated 2 years ago
- Lecture: Big Data☆14Oct 27, 2025Updated 7 months ago
- POC for all the stack of big data (kafka, spark, cassandra, hdfs, docker, springboot)☆12Dec 16, 2022Updated 3 years ago
- https://aka.ms/lakehouselab☆23Feb 14, 2023Updated 3 years ago
- ☆19May 11, 2023Updated 3 years ago
- ☆41Jul 4, 2022Updated 3 years ago
- ☆18Jul 25, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Spark data pipeline that processes movie ratings data.☆31May 1, 2026Updated last month
- Repo which holds the materials for the EMR Zero To Hero☆28May 7, 2022Updated 4 years ago
- an end-to-end data pipeline extracting music listening habits and producing an insightful dashboard☆18Mar 31, 2024Updated 2 years ago
- This project uses Google Analytics 4 BigQuery Exports as its source data, and offers useful base transformations to provide report-ready …☆20Sep 30, 2022Updated 3 years ago
- 🌟 An end-to-end full-stack data science project, including modelling, MLOps, and data storytelling. ✨☆16Aug 30, 2025Updated 9 months ago
- ☆81Apr 23, 2025Updated last year
- Udacity Data Engineering Nanodegree Project 3☆12Jul 14, 2019Updated 6 years ago
- ☆13Mar 30, 2024Updated 2 years ago
- This repo demonstrates how to capture any incoming request and write it as JSON to nginx log using Nginx and Lua. For more details refer …☆12May 22, 2017Updated 9 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- List customize [dot] files config.☆11May 14, 2025Updated last year
- Spark UDFs to deserialize Avro messages with schemas stored in Schema Registry.☆20Jan 11, 2018Updated 8 years ago
- ☆10Aug 2, 2021Updated 4 years ago
- Đồ án tốt nghiệp | Data Lakehouse☆44Feb 9, 2026Updated 4 months ago
- Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline☆77Feb 15, 2023Updated 3 years ago
- Reads a HBase table and writes the out as Text, Seq, Avro, or Parquet☆28May 15, 2014Updated 12 years ago
- Repo for learning DBT with Snowflake, featuring projects and models for data transformation and automation☆26Mar 31, 2025Updated last year
- Google Cloud Platform solution that provides an event driven process that flattens (unnests) Google Analytics 360 data that has been expo…☆16Apr 13, 2026Updated 2 months ago
- Machine Learning DevOps Engineer Nanodegree☆11Jan 27, 2022Updated 4 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- ☆10Nov 25, 2021Updated 4 years ago
- CF-Clearance Scraper to pull out the cf token with use of automation tooling and an api :)☆20Sep 18, 2024Updated last year
- Vietnamese Large Language Model (LLM) fine-tuned for the task of Question Answering within the medical and healthcare domain☆26Mar 1, 2024Updated 2 years ago
- Thư viện sách của Xóm - Free & Public 😎☆203Sep 27, 2025Updated 8 months ago
- ☆12Jul 11, 2022Updated 3 years ago
- ☆21Mar 11, 2025Updated last year
- Complete data engineering pipeline running on Minikube Kubernetes, Argo CD, Spark, Trino, S3, Delta lake, Postgres+ Debezium CDC, MySQL,…☆29May 19, 2025Updated last year