An end-to-end data engineering pipeline that fetches data from Wikipedia, cleans and transforms it with Apache Airflow and saves it on Azure Data Lake. Other processing takes place on Azure Data Factory, Azure Synapse and Tableau.
☆32Oct 2, 2023Updated 2 years ago
Alternatives and similar repositories for FootballDataEngineering
Users that are interested in FootballDataEngineering are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This project provides an end-to-end data processing and visualization of visa numbers in Japan using PySpark and Plotly. The spark cluste…☆12Oct 11, 2023Updated 2 years ago
- This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenA…☆44Jan 4, 2024Updated 2 years ago
- This project showcases how to integrate the world of DevOps, focusing on Continuous Integration (CI) and Continuous Deployment (CD) with …☆14Dec 27, 2023Updated 2 years ago
- This repository contains an end-to-end data engineering project using Apache Flink, focused on performing sales analytics. The project de…☆12Nov 18, 2023Updated 2 years ago
- An end-to-end data engineering pipeline that fetches real-time YouTube analytics and streams them through Kafka for processing with ksqlD…☆16Sep 19, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- This repository contains the necessary configuration files and DAGs (Directed Acyclic Graphs) for setting up a robust data engineering en…☆25Jan 26, 2024Updated 2 years ago
- ☆15Feb 14, 2025Updated last year
- An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Ka…☆323Feb 14, 2025Updated last year
- Repo to store my simulation of FIFA 2022 World Cup using Machine Learning☆14Nov 15, 2022Updated 3 years ago
- This project shows how to capture changes from postgres database and stream them into kafka☆42May 17, 2024Updated last year
- A data pipeline for processing football data using Python and SQL☆13Sep 12, 2023Updated 2 years ago
- Early detecting of lung cancer using the Luna data set with LIDC IDRI annotations using two models nodule classification"Googlent model" …☆16Sep 30, 2022Updated 3 years ago
- This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessar…☆50Dec 4, 2023Updated 2 years ago
- This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgr…☆47Dec 11, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python…☆48Mar 14, 2024Updated 2 years ago
- Produce Kafka messages, consume them and upload into Cassandra, MongoDB.☆43Sep 26, 2023Updated 2 years ago
- Snowflake - Build and Architect Data Pipelines using AWS, published by Packt☆23Apr 3, 2023Updated 3 years ago
- ms-dataverse is a Python module for Microsoft Dataverse, offering a lightweight ORM to query, create, update, and delete entities. Utiliz…☆13Apr 10, 2023Updated 3 years ago
- This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data wareh…☆214Oct 23, 2023Updated 2 years ago
- The official documentation of the City of Boston's Analytics Team.☆13Jan 21, 2025Updated last year
- Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO☆65Jul 21, 2023Updated 2 years ago
- ☆19Feb 1, 2025Updated last year
- ☆17Mar 10, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- This repository showcases a collection of machine learning projects in various domains, demonstrating my skills and expertise as a data s…☆11Nov 20, 2023Updated 2 years ago
- This is an end to end MLOps system☆34Nov 27, 2025Updated 5 months ago
- Transform data from on-premises SQL Server to Azure Delta Lake Storage for Analytics and Visualization☆25Jul 16, 2023Updated 2 years ago
- ☆32Nov 14, 2024Updated last year
- ☆12Jan 14, 2023Updated 3 years ago
- ☆28Jul 9, 2025Updated 9 months ago
- ☆18May 11, 2023Updated 2 years ago
- ☆41Dec 5, 2023Updated 2 years ago
- Functional Data Engineering tutorial in Python & Airflow.☆17Mar 24, 2023Updated 3 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Local SQL Database ---> Azure ---> Power BI☆15Oct 13, 2023Updated 2 years ago
- Project exploring data collection, visualisation and analysis of Sports Statistics.☆14Dec 17, 2020Updated 5 years ago
- ☆15Aug 5, 2023Updated 2 years ago
- With everything I learned from DEZoomcamp from datatalks.club, this project performs a batch processing on AWS for the cycling dataset wh…☆15Jan 4, 2026Updated 3 months ago
- ☆30Jul 29, 2023Updated 2 years ago
- End-to-end Data Project (DA/DS/DE/MLOps) - retail/e-commerce - interpretable dynamic clustering☆20Jul 12, 2025Updated 9 months ago
- Data Engineering portfolio projects, resources used to study data tools...☆30Mar 25, 2024Updated 2 years ago