An end-to-end data engineering pipeline that fetches data from Wikipedia, cleans and transforms it with Apache Airflow and saves it on Azure Data Lake. Other processing takes place on Azure Data Factory, Azure Synapse and Tableau.
☆32Oct 2, 2023Updated 2 years ago
Alternatives and similar repositories for FootballDataEngineering
Users that are interested in FootballDataEngineering are comparing it to the libraries listed below
Sorting:
- This project provides an end-to-end data processing and visualization of visa numbers in Japan using PySpark and Plotly. The spark cluste…☆12Oct 11, 2023Updated 2 years ago
- This project showcases how to integrate the world of DevOps, focusing on Continuous Integration (CI) and Continuous Deployment (CD) with …☆15Dec 27, 2023Updated 2 years ago
- This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenA…☆44Jan 4, 2024Updated 2 years ago
- This repository contains an end-to-end data engineering project using Apache Flink, focused on performing sales analytics. The project de…☆11Nov 18, 2023Updated 2 years ago
- An end-to-end data engineering pipeline that fetches real-time YouTube analytics and streams them through Kafka for processing with ksqlD…☆16Sep 19, 2023Updated 2 years ago
- Includes all the Practice Material and Project☆21May 19, 2025Updated 9 months ago
- This repository contains the necessary configuration files and DAGs (Directed Acyclic Graphs) for setting up a robust data engineering en…☆24Jan 26, 2024Updated 2 years ago
- An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Ka…☆316Feb 14, 2025Updated last year
- Information for potential applicants to MoJ Data Engineering, including links to our work and information about our teams.☆10Sep 18, 2024Updated last year
- Helpful Resources for COGS 108 Students☆10Oct 9, 2025Updated 4 months ago
- This project shows how to capture changes from postgres database and stream them into kafka☆41May 17, 2024Updated last year
- This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data wareh…☆206Oct 23, 2023Updated 2 years ago
- This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgr…☆45Dec 11, 2023Updated 2 years ago
- Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake developme…☆12Feb 26, 2020Updated 6 years ago
- A data pipeline for processing football data using Python and SQL☆13Sep 12, 2023Updated 2 years ago
- ☆12Aug 8, 2023Updated 2 years ago
- This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python…☆48Mar 14, 2024Updated last year
- ☆12Jan 14, 2023Updated 3 years ago
- This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessar…☆48Dec 4, 2023Updated 2 years ago
- Produce Kafka messages, consume them and upload into Cassandra, MongoDB.☆43Sep 26, 2023Updated 2 years ago
- A complete computer science study plan to become a software engineer.☆11Feb 17, 2024Updated 2 years ago
- Making and developing R packages☆21Jan 6, 2026Updated last month
- Project exploring data collection, visualisation and analysis of Sports Statistics.☆13Dec 17, 2020Updated 5 years ago
- Dreame Vacuum Map Card for Home Assistant Integration☆49Updated this week
- This is an end to end MLOps system☆34Nov 27, 2025Updated 3 months ago
- ☆17Feb 1, 2025Updated last year
- ☆15Aug 5, 2023Updated 2 years ago
- A course in data warehouse☆19Sep 27, 2025Updated 5 months ago
- apache-spark-with-databricks-for-data-engineering☆100Jul 3, 2024Updated last year
- Transform data from on-premises SQL Server to Azure Delta Lake Storage for Analytics and Visualization☆19Jul 16, 2023Updated 2 years ago
- This is a guided certification project, as a part of Data Science for Social Good initiative☆18Mar 9, 2020Updated 5 years ago
- Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO☆65Jul 21, 2023Updated 2 years ago
- Toolset for detecting reflected xss in websites☆16Oct 6, 2018Updated 7 years ago
- ☆14Oct 1, 2022Updated 3 years ago
- With everything I learned from DEZoomcamp from datatalks.club, this project performs a batch processing on AWS for the cycling dataset wh…☆15Jan 4, 2026Updated last month
- Data Engineering portfolio projects, resources used to study data tools...☆30Mar 25, 2024Updated last year
- ☆24Aug 28, 2023Updated 2 years ago
- ☆15Aug 3, 2022Updated 3 years ago
- ☆17Apr 26, 2024Updated last year