Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data
☆52Dec 2, 2023Updated 2 years ago
Alternatives and similar repositories for Building-Data-LakeHouse
Users that are interested in Building-Data-LakeHouse are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This project implements an ELT (Extract - Load - Transform) data pipeline with the goodreads dataset, using dagster (orchestration), spar…☆43Apr 22, 2023Updated 3 years ago
- velib-v2: An ETL pipeline that employs batch and streaming jobs using Spark, Kafka, Airflow, and other tools, all orchestrated with Docke…☆20Aug 12, 2025Updated 8 months ago
- A web app for both Text-based and Visual Question Answering.☆13Nov 13, 2023Updated 2 years ago
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆78Sep 2, 2023Updated 2 years ago
- ☆23Feb 5, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆13Oct 4, 2023Updated 2 years ago
- NoSQL extract, transform, load (ETL) toolkit with Python☆16Apr 26, 2026Updated last week
- Basic framework utilities to quickly start writing production ready Apache Spark applications☆36Dec 15, 2024Updated last year
- trino + hive + minio with postgres in docker compose☆27Aug 18, 2023Updated 2 years ago
- Lecture: Big Data☆14Oct 27, 2025Updated 6 months ago
- used Airflow, Postgres, Kafka, Spark, and Cassandra, and GitHub Actions to establish an end-to-end data pipeline☆32Oct 25, 2023Updated 2 years ago
- POC for all the stack of big data (kafka, spark, cassandra, hdfs, docker, springboot)☆12Dec 16, 2022Updated 3 years ago
- https://aka.ms/lakehouselab☆23Feb 14, 2023Updated 3 years ago
- Commercetools Python SDK☆17Apr 28, 2026Updated last week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Spark data pipeline that processes movie ratings data.☆31Updated this week
- Repo which holds the materials for the EMR Zero To Hero☆27May 7, 2022Updated 4 years ago
- an end-to-end data pipeline extracting music listening habits and producing an insightful dashboard☆17Mar 31, 2024Updated 2 years ago
- A ready to go Big Data cluster (Hadoop + Hadoop Streaming + Spark + PySpark) with Docker and Docker Swarm!☆22May 20, 2025Updated 11 months ago
- ☆81Apr 23, 2025Updated last year
- ☆12Feb 27, 2024Updated 2 years ago
- Udacity Data Engineering Nanodegree Project 3☆12Jul 14, 2019Updated 6 years ago
- AKS Course - Pluralsight☆10Oct 29, 2022Updated 3 years ago
- List customize [dot] files config.☆11May 14, 2025Updated 11 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Spark UDFs to deserialize Avro messages with schemas stored in Schema Registry.☆20Jan 11, 2018Updated 8 years ago
- StarCraft 2 Data Pipeline with Airflow, DuckDB and Streamlit☆16Mar 14, 2024Updated 2 years ago
- ☆10Aug 2, 2021Updated 4 years ago
- Đồ án tốt nghiệp | Data Lakehouse☆42Feb 9, 2026Updated 2 months ago
- Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline☆76Feb 15, 2023Updated 3 years ago
- The MapsIndoors SDK is the idea of integrating everything on your venue, like people, goods, offices, shops, rooms and buildings with the…☆25Oct 1, 2024Updated last year
- Repo for learning DBT with Snowflake, featuring projects and models for data transformation and automation☆26Mar 31, 2025Updated last year
- Code for youtube channel☆10Apr 15, 2022Updated 4 years ago
- Mass Suricata rules creator, from a list of domain☆14Sep 14, 2018Updated 7 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Google Cloud Platform solution that provides an event driven process that flattens (unnests) Google Analytics 360 data that has been expo…☆16Apr 13, 2026Updated 3 weeks ago
- Machine Learning DevOps Engineer Nanodegree☆11Jan 27, 2022Updated 4 years ago
- ☆10Nov 25, 2021Updated 4 years ago
- ☆12Jul 11, 2022Updated 3 years ago
- ☆21Mar 11, 2025Updated last year
- ☆10Jul 19, 2018Updated 7 years ago
- Resources backing the Feast fraud tutorial on GCP☆14May 31, 2022Updated 3 years ago