Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data
☆52Dec 2, 2023Updated 2 years ago
Alternatives and similar repositories for Building-Data-LakeHouse
Users that are interested in Building-Data-LakeHouse are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize a…☆40Dec 15, 2025Updated 5 months ago
- This project implements an ELT (Extract - Load - Transform) data pipeline with the goodreads dataset, using dagster (orchestration), spar…☆44Apr 22, 2023Updated 3 years ago
- velib-v2: An ETL pipeline that employs batch and streaming jobs using Spark, Kafka, Airflow, and other tools, all orchestrated with Docke…☆20Aug 12, 2025Updated 9 months ago
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆78Sep 2, 2023Updated 2 years ago
- ☆23Feb 5, 2024Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Big Data infrastructure with Hadoop, Spark, Hive and NiFi deployed using Docker Compose. https://doi.org/10.5281/zenodo.18968438☆21Mar 11, 2026Updated 2 months ago
- ☆13Oct 4, 2023Updated 2 years ago
- This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenA…☆45Jan 4, 2024Updated 2 years ago
- Basic framework utilities to quickly start writing production ready Apache Spark applications☆36Dec 15, 2024Updated last year
- trino + hive + minio with postgres in docker compose☆27Aug 18, 2023Updated 2 years ago
- Image building contents for running Spark standalone on Kubernetes☆16Apr 10, 2020Updated 6 years ago
- Lecture: Big Data☆14Oct 27, 2025Updated 7 months ago
- used Airflow, Postgres, Kafka, Spark, and Cassandra, and GitHub Actions to establish an end-to-end data pipeline☆31Oct 25, 2023Updated 2 years ago
- POC for all the stack of big data (kafka, spark, cassandra, hdfs, docker, springboot)☆12Dec 16, 2022Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A Micosoft Power BI Custom Connector allowing you to import Trino data into Power BI.☆93Jan 8, 2025Updated last year
- ☆18Jul 25, 2024Updated last year
- an end-to-end data pipeline extracting music listening habits and producing an insightful dashboard☆18Mar 31, 2024Updated 2 years ago
- Runner-up team (2nd place) in AI4VN2022: Air Quality Forcasting Challenge☆31Jul 12, 2023Updated 2 years ago
- ☆81Apr 23, 2025Updated last year
- This repo demonstrates how to capture any incoming request and write it as JSON to nginx log using Nginx and Lua. For more details refer …☆12May 22, 2017Updated 9 years ago
- Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from…☆35Jan 5, 2023Updated 3 years ago
- List customize [dot] files config.☆11May 14, 2025Updated last year
- Spark UDFs to deserialize Avro messages with schemas stored in Schema Registry.☆20Jan 11, 2018Updated 8 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- StarCraft 2 Data Pipeline with Airflow, DuckDB and Streamlit☆16Mar 14, 2024Updated 2 years ago
- Limit Order Book Convolutional Neural Network trading bot☆14Jul 24, 2022Updated 3 years ago
- Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline☆77Feb 15, 2023Updated 3 years ago
- The MapsIndoors SDK is the idea of integrating everything on your venue, like people, goods, offices, shops, rooms and buildings with the…☆25Oct 1, 2024Updated last year
- In this project I have built etl pipline which scraps the trending repository based on month,week and day LIVE extract other related info…☆12Sep 9, 2023Updated 2 years ago
- Code for youtube channel☆10Apr 15, 2022Updated 4 years ago
- Mass Suricata rules creator, from a list of domain☆14Sep 14, 2018Updated 7 years ago
- Repo for learning DBT with Snowflake, featuring projects and models for data transformation and automation☆26Mar 31, 2025Updated last year
- Google Cloud Platform solution that provides an event driven process that flattens (unnests) Google Analytics 360 data that has been expo…☆16Apr 13, 2026Updated last month
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Apache Beam Python examples and templates.☆14Dec 8, 2022Updated 3 years ago
- ☆10Nov 25, 2021Updated 4 years ago
- Vietnamese Large Language Model (LLM) fine-tuned for the task of Question Answering within the medical and healthcare domain☆26Mar 1, 2024Updated 2 years ago
- Thư viện sách của Xóm - Free & Public 😎☆199Sep 27, 2025Updated 8 months ago
- ☆12Jul 11, 2022Updated 3 years ago
- Command line client for the Fugue API☆14Mar 7, 2023Updated 3 years ago
- Resources backing the Feast fraud tutorial on GCP☆14May 31, 2022Updated 3 years ago