Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data
☆50Dec 2, 2023Updated 2 years ago
Alternatives and similar repositories for Building-Data-LakeHouse
Users that are interested in Building-Data-LakeHouse are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This project implements an ELT (Extract - Load - Transform) data pipeline with the goodreads dataset, using dagster (orchestration), spar…☆43Apr 22, 2023Updated 2 years ago
- velib-v2: An ETL pipeline that employs batch and streaming jobs using Spark, Kafka, Airflow, and other tools, all orchestrated with Docke…☆20Aug 12, 2025Updated 7 months ago
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆75Sep 2, 2023Updated 2 years ago
- ☆22Feb 5, 2024Updated 2 years ago
- Fully automated csv to dashboard pipeline using Terraform, Google Cloud Storage, BigQuery, dbt, Prefect and Looker Studio. Peer ranked …☆41Nov 15, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Langgraph demo with network devices and telemetry☆21Aug 20, 2025Updated 7 months ago
- ☆13Oct 4, 2023Updated 2 years ago
- ☆25Mar 15, 2024Updated 2 years ago
- Basic framework utilities to quickly start writing production ready Apache Spark applications☆36Dec 15, 2024Updated last year
- Image building contents for running Spark standalone on Kubernetes☆16Apr 10, 2020Updated 5 years ago
- used Airflow, Postgres, Kafka, Spark, and Cassandra, and GitHub Actions to establish an end-to-end data pipeline☆30Oct 25, 2023Updated 2 years ago
- POC for all the stack of big data (kafka, spark, cassandra, hdfs, docker, springboot)☆12Dec 16, 2022Updated 3 years ago
- https://aka.ms/lakehouselab☆23Feb 14, 2023Updated 3 years ago
- ☆18May 11, 2023Updated 2 years ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- ☆41Jul 4, 2022Updated 3 years ago
- A Micosoft Power BI Custom Connector allowing you to import Trino data into Power BI.☆90Jan 8, 2025Updated last year
- ☆18Jul 25, 2024Updated last year
- Spark data pipeline that processes movie ratings data.☆31Mar 1, 2026Updated 3 weeks ago
- Repo which holds the materials for the EMR Zero To Hero☆27May 7, 2022Updated 3 years ago
- an end-to-end data pipeline extracting music listening habits and producing an insightful dashboard☆17Mar 31, 2024Updated last year
- This project uses Google Analytics 4 BigQuery Exports as its source data, and offers useful base transformations to provide report-ready …☆20Sep 30, 2022Updated 3 years ago
- 🌟 An end-to-end full-stack data science project, including modelling, MLOps, and data storytelling. ✨☆16Aug 30, 2025Updated 6 months ago
- ☆81Apr 23, 2025Updated 11 months ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- ☆11Feb 27, 2024Updated 2 years ago
- Udacity Data Engineering Nanodegree Project 3☆12Jul 14, 2019Updated 6 years ago
- AKS Course - Pluralsight☆10Oct 29, 2022Updated 3 years ago
- This repo demonstrates how to capture any incoming request and write it as JSON to nginx log using Nginx and Lua. For more details refer …☆12May 22, 2017Updated 8 years ago
- Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from…☆35Jan 5, 2023Updated 3 years ago
- List customize [dot] files config.☆11May 14, 2025Updated 10 months ago
- Spark UDFs to deserialize Avro messages with schemas stored in Schema Registry.☆20Jan 11, 2018Updated 8 years ago
- StarCraft 2 Data Pipeline with Airflow, DuckDB and Streamlit☆16Mar 14, 2024Updated 2 years ago
- Limit Order Book Convolutional Neural Network trading bot☆14Jul 24, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline☆76Feb 15, 2023Updated 3 years ago
- The MapsIndoors SDK is the idea of integrating everything on your venue, like people, goods, offices, shops, rooms and buildings with the…☆25Oct 1, 2024Updated last year
- Reads a HBase table and writes the out as Text, Seq, Avro, or Parquet☆28May 15, 2014Updated 11 years ago
- In this project I have built etl pipline which scraps the trending repository based on month,week and day LIVE extract other related info…☆12Sep 9, 2023Updated 2 years ago
- Code for youtube channel☆10Apr 15, 2022Updated 3 years ago
- Mass Suricata rules creator, from a list of domain☆14Sep 14, 2018Updated 7 years ago
- Google Cloud Platform solution that provides an event driven process that flattens (unnests) Google Analytics 360 data that has been expo…☆16Sep 9, 2021Updated 4 years ago