An end-to-end data engineering pipeline that fetches data from Wikipedia, cleans and transforms it with Apache Airflow and saves it on Azure Data Lake. Other processing takes place on Azure Data Factory, Azure Synapse and Tableau.
☆31Oct 2, 2023Updated 2 years ago
Alternatives and similar repositories for FootballDataEngineering
Users that are interested in FootballDataEngineering are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenA…☆44Jan 4, 2024Updated 2 years ago
- In this project, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our …☆39Dec 18, 2023Updated 2 years ago
- An end-to-end data engineering pipeline that fetches real-time YouTube analytics and streams them through Kafka for processing with ksqlD…☆16Sep 19, 2023Updated 2 years ago
- This repository contains the necessary configuration files and DAGs (Directed Acyclic Graphs) for setting up a robust data engineering en…☆25Jan 26, 2024Updated 2 years ago
- Includes all the Practice Material and Project☆28May 19, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Ka…☆325Feb 14, 2025Updated last year
- This project shows how to capture changes from postgres database and stream them into kafka☆42May 17, 2024Updated 2 years ago
- A data pipeline for processing football data using Python and SQL☆13Sep 12, 2023Updated 2 years ago
- This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessar…☆51Dec 4, 2023Updated 2 years ago
- This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgr…☆48Dec 11, 2023Updated 2 years ago
- This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python…☆48Mar 14, 2024Updated 2 years ago
- A Python wrapper for the Iterable API☆12Jan 7, 2026Updated 4 months ago
- Quantum Black Hackathon organised by Analytics Vidya☆13Jul 23, 2019Updated 6 years ago
- Snowflake - Build and Architect Data Pipelines using AWS, published by Packt☆24Apr 3, 2023Updated 3 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data wareh…☆214Oct 23, 2023Updated 2 years ago
- Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO☆65Jul 21, 2023Updated 2 years ago
- ☆18Feb 1, 2025Updated last year
- ☆27Nov 26, 2025Updated 5 months ago
- ☆17Mar 10, 2025Updated last year
- Data pipeline from device to cloud☆11May 14, 2022Updated 4 years ago
- This repository showcases a collection of machine learning projects in various domains, demonstrating my skills and expertise as a data s…☆11Nov 20, 2023Updated 2 years ago
- Toolset for detecting reflected xss in websites☆16Oct 6, 2018Updated 7 years ago
- Transform data from on-premises SQL Server to Azure Delta Lake Storage for Analytics and Visualization☆26Jul 16, 2023Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Scrapper and analyzer of shared scooter data☆11Jul 30, 2024Updated last year
- ☆31Nov 14, 2024Updated last year
- Data Structures and Algorithms☆22May 10, 2026Updated last week
- Python wrapper for Goodreads API☆30Feb 20, 2020Updated 6 years ago
- ☆18May 11, 2023Updated 3 years ago
- Superstore Sales with Streamlit is a data visualization and analysis project that uses the Streamlit framework to create an interactive w…☆23Aug 24, 2023Updated 2 years ago
- collection of modules to build distributed and reliable concurrent systems in Python.☆207Sep 14, 2013Updated 12 years ago
- Local SQL Database ---> Azure ---> Power BI☆15Oct 13, 2023Updated 2 years ago
- ☆15Aug 5, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- With everything I learned from DEZoomcamp from datatalks.club, this project performs a batch processing on AWS for the cycling dataset wh…☆15Jan 4, 2026Updated 4 months ago
- ☆30Jul 29, 2023Updated 2 years ago
- End-to-end Data Project (DA/DS/DE/MLOps) - retail/e-commerce - interpretable dynamic clustering☆21Jul 12, 2025Updated 10 months ago
- This formatter which is for handling parameters and file uploaded to Web API controller.☆26Dec 7, 2022Updated 3 years ago
- This project leverages GCS, Composer, Dataflow, BigQuery, and Looker on Google Cloud Platform (GCP) to build a robust data engineering so…☆35Dec 12, 2023Updated 2 years ago
- ☆15Aug 3, 2022Updated 3 years ago
- Password Manager is a simple and secure application designed to store and manage your passwords. Developed using Java, it employs AES enc…☆57Sep 10, 2024Updated last year