iam-mhaseeb / Skytrax-Data-WarehouseView external linksLinks
A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for cloud data warehouse and Metabase to serve the needs of data visualizations such as analytical dashboards.
☆141Apr 18, 2020Updated 5 years ago
Alternatives and similar repositories for Skytrax-Data-Warehouse
Users that are interested in Skytrax-Data-Warehouse are comparing it to the libraries listed below
Sorting:
- Tracking and measuring neighborhood and district-level eviction rates in the city of San Francisco.☆140Jul 14, 2020Updated 5 years ago
- Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow☆162Jun 16, 2020Updated 5 years ago
- Airflow ETL for Meetup API☆45Dec 27, 2018Updated 7 years ago
- RedditR for Content Engagement and Recommendation☆18Dec 21, 2017Updated 8 years ago
- Pyspark Spotify ETL☆17Aug 19, 2021Updated 4 years ago
- Tough and flexible tools for data analysis, transformation, validation and movement.☆140Jan 26, 2024Updated 2 years ago
- A collection of data analysis projects done using PySpark via Jupyter notebooks.☆10Oct 8, 2022Updated 3 years ago
- Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake developme…☆1,814Aug 26, 2022Updated 3 years ago
- Run greatexpectations.io on ANY SQL Engine using REST API. Supported by FastAPI, Pydantic and SQLAlchemy as best data quality tool☆14Dec 12, 2025Updated 2 months ago
- ☆10May 24, 2021Updated 4 years ago
- Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation,…☆89Nov 22, 2021Updated 4 years ago
- Built a Data Pipeline for a Retail store using AWS services that collects data from its transactional database (OLTP) in Snowflake and tr…☆11May 25, 2023Updated 2 years ago
- Runnable e-commerce mini data warehouse based on Python, PostgreSQL & Metabase, template for new projects☆29Mar 31, 2021Updated 4 years ago
- Beginner data engineering project - batch edition☆564Jan 22, 2025Updated last year
- A template for Python projects that need to use a relational database, including tooling for managing schema migrations and testing again…☆13Dec 13, 2024Updated last year
- reating a modern data pipeline using a combination of Terraform, AWS Lambda and S3, Snowflake, DBT, Mage AI, and Dash.☆15Jun 26, 2023Updated 2 years ago
- Template for a DuckDB-based, Codespace-oriented sandbox project that is also dbt Cloud compatible, and includes code-first BI tooling via…☆16Apr 7, 2023Updated 2 years ago
- An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.☆1,485Mar 9, 2020Updated 5 years ago
- Example end to end data engineering project.☆1,384Dec 8, 2022Updated 3 years ago
- Use AWS Lambda to Pull E-Scooter and E-Bike Location Data, store in S3 & Redshift using Data Vault Data Model, Server to Google Data Stud…☆16Jun 12, 2022Updated 3 years ago
- Personal Data Engineering Projects☆989Feb 8, 2023Updated 3 years ago
- Data Engineering pipeline hosted entirely in the AWS ecosystem utilizing DocumentDB as the database☆14Oct 26, 2021Updated 4 years ago
- ETL (Extract, Transform and Load) with the Spark Python API (PySpark) and Hadoop Distributed File System (HDFS)☆17Dec 18, 2018Updated 7 years ago
- Nicely modeled data built on the Github Archive.☆69Jan 23, 2026Updated 3 weeks ago
- Open-source metadata collector based on ODD Specification☆44Nov 6, 2023Updated 2 years ago
- Anotações e scripts de web scraping, screen scraping, etc☆18Mar 21, 2018Updated 7 years ago
- ☆198Feb 25, 2022Updated 3 years ago
- Send Slack messages spiced with data from Snowflake☆17Jan 27, 2022Updated 4 years ago
- My Insight Data Engineering Fellowship project. I implemented a big data processing pipeline based on lambda architecture, that aggrega…☆508Aug 24, 2022Updated 3 years ago
- Data engineering interviews Q&A for data community by data community☆66Jun 7, 2020Updated 5 years ago
- Any Airflow project day 1, you can spin up a local desktop Kubernetes Airflow environment AND one in Google Cloud Composer with tested da…☆113Sep 21, 2023Updated 2 years ago
- Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.☆347Jan 12, 2022Updated 4 years ago
- Constructed a dashboard with FastAPI that extracts data from the yfinance API to a SQLAlchemy database.☆21Mar 16, 2025Updated 11 months ago
- A set of coding challenge for various engineering roles at Isentia☆21Sep 14, 2021Updated 4 years ago
- Gradient Boosting Models on Real-Time Sensor Data for AI-Enhanced Vehicle Predictive Maintenance. By using a web-based interface to forec…☆19Nov 17, 2024Updated last year
- Developed an ETL pipeline for a Data Lake that extracts data from S3, processes the data using Spark, and loads the data back into S3 as …☆17Oct 1, 2019Updated 6 years ago
- pyspark methods to enhance developer productivity 📣 👯 🎉☆682Mar 6, 2025Updated 11 months ago
- Big Data Engineering practice project, including ETL with Airflow and Spark using AWS S3 and EMR☆88Jul 17, 2019Updated 6 years ago
- Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.☆21Jan 30, 2019Updated 7 years ago