Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data
☆52Dec 2, 2023Updated 2 years ago
Alternatives and similar repositories for Building-Data-LakeHouse
Users that are interested in Building-Data-LakeHouse are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize a…☆40Dec 15, 2025Updated 4 months ago
- This project implements an ELT (Extract - Load - Transform) data pipeline with the goodreads dataset, using dagster (orchestration), spar…☆43Apr 22, 2023Updated 2 years ago
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆76Sep 2, 2023Updated 2 years ago
- ☆22Feb 5, 2024Updated 2 years ago
- Fully automated csv to dashboard pipeline using Terraform, Google Cloud Storage, BigQuery, dbt, Prefect and Looker Studio. Peer ranked …☆45Nov 15, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Big Data infrastructure with Hadoop, Spark, Hive and NiFi deployed using Docker Compose. https://doi.org/10.5281/zenodo.18968438☆21Mar 11, 2026Updated last month
- ☆13Oct 4, 2023Updated 2 years ago
- This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenA…☆44Jan 4, 2024Updated 2 years ago
- Basic framework utilities to quickly start writing production ready Apache Spark applications☆36Dec 15, 2024Updated last year
- Lecture: Big Data☆14Oct 27, 2025Updated 5 months ago
- POC for all the stack of big data (kafka, spark, cassandra, hdfs, docker, springboot)☆12Dec 16, 2022Updated 3 years ago
- Flight data analysis in python using numpy, pandas, matplotlib and seaborn☆11Jul 28, 2018Updated 7 years ago
- ☆41Jul 4, 2022Updated 3 years ago
- Spark data pipeline that processes movie ratings data.☆31Apr 1, 2026Updated 2 weeks ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Data Guy Story commandline☆11Dec 2, 2022Updated 3 years ago
- Spark Structured Streaming data pipeline that processes movie ratings data in real-time.☆14Mar 1, 2026Updated last month
- The DataLake GraphQL Wrapper provides a GraphQL API for presto/trino.☆19Apr 17, 2023Updated 3 years ago
- ☆13Mar 30, 2024Updated 2 years ago
- List customize [dot] files config.☆11May 14, 2025Updated 11 months ago
- StarCraft 2 Data Pipeline with Airflow, DuckDB and Streamlit☆16Mar 14, 2024Updated 2 years ago
- Limit Order Book Convolutional Neural Network trading bot☆14Jul 24, 2022Updated 3 years ago
- Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline☆76Feb 15, 2023Updated 3 years ago
- The MapsIndoors SDK is the idea of integrating everything on your venue, like people, goods, offices, shops, rooms and buildings with the…☆25Oct 1, 2024Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- In this project I have built etl pipline which scraps the trending repository based on month,week and day LIVE extract other related info…☆12Sep 9, 2023Updated 2 years ago
- Mass Suricata rules creator, from a list of domain☆14Sep 14, 2018Updated 7 years ago
- Google Cloud Platform solution that provides an event driven process that flattens (unnests) Google Analytics 360 data that has been expo…☆16Sep 9, 2021Updated 4 years ago
- Machine Learning DevOps Engineer Nanodegree☆11Jan 27, 2022Updated 4 years ago
- Sentiment Analysis and Trend Monitoring to Predict Cryptocurrency Market Movements☆15Nov 14, 2018Updated 7 years ago
- plan, design and implement enterprise data infrastructure solutions and create the blueprints for an organization’s data management syste…☆14Jun 25, 2023Updated 2 years ago
- ☆10Nov 25, 2021Updated 4 years ago
- ☆14Updated this week
- Experiments, results and additional material from "Mastering Java Machine Learning" (PACKT Publishing)☆14Jul 10, 2017Updated 8 years ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆12Jul 11, 2022Updated 3 years ago
- Command line client for the Fugue API☆14Mar 7, 2023Updated 3 years ago
- ☆10Jul 19, 2018Updated 7 years ago
- Resources backing the Feast fraud tutorial on GCP☆14May 31, 2022Updated 3 years ago
- A Django + PyPDF2 application extracting PDF pages, merging and replacing PDF files online.☆18Nov 13, 2018Updated 7 years ago
- End-to-end ELT data engineering project☆22Dec 24, 2022Updated 3 years ago
- Extension package for dbt to build a metadata table for your dbt models along side your models.☆15Mar 31, 2023Updated 3 years ago