Spark, Airflow, Kafka
☆24Apr 30, 2023Updated 3 years ago
Alternatives and similar repositories for data-engineering
Users that are interested in data-engineering are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This repository shows my personal notes taken while doing the Udacity Data engineering Nanodegree☆13May 28, 2020Updated 5 years ago
- Data Engineering Capstone Project: ETL Pipelines and Data Warehouse Development☆22Jul 9, 2019Updated 6 years ago
- This is a simple ETL project with Python :)☆40Oct 31, 2022Updated 3 years ago
- ☆10May 24, 2021Updated 4 years ago
- The Data Pipeline and Analytics Stack is a comprehensive solution designed for processing, storing, and visualizing data. Explore a compl…☆18Dec 26, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Notes for the book - Head First Design Patterns☆15Mar 21, 2021Updated 5 years ago
- A simple tool for monitoring the progress of OpenFOAM simulations☆13Nov 9, 2018Updated 7 years ago
- This project aims to build a streaming application to perform real-time analytics of Covid-19 related tweets and deploy an ML model for r…☆14Jul 15, 2021Updated 4 years ago
- Starting with Cassandra on Python Flask☆17Mar 19, 2021Updated 5 years ago
- Chrome Extension for Development/Testing/Exploring GraphQL Servers☆14Oct 1, 2018Updated 7 years ago
- Matlab toolbox for generating block structured hex meshes in the polyMesh file format of OpenFOAM.☆13Jan 2, 2013Updated 13 years ago
- Candidate solution for Facebook's fake news problem using machine learning and crowd-sourcing. Takes form of a Chrome extension. Develope…☆13Aug 25, 2017Updated 8 years ago
- Project based learning for Data Engineering fundamentals.☆13Jan 15, 2021Updated 5 years ago
- Airflow ETL for Meetup API☆45Dec 27, 2018Updated 7 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Data Science for Good links.☆14Nov 10, 2021Updated 4 years ago
- Code for my blogs on Data Engineering☆15Nov 9, 2020Updated 5 years ago
- An end-to-end data engineering pipeline to create a dashboard for the latest content on the r/Stocks subreddit☆20Aug 5, 2022Updated 3 years ago
- A way for home buyers to know about factors affecting a state☆48Mar 2, 2019Updated 7 years ago
- My professional portfolio with some of my best data science projects.☆11Jun 22, 2017Updated 8 years ago
- A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apa…☆29Jun 7, 2023Updated 2 years ago
- Group Project: CFD solver taking heat into account, with transport of chemical substances and chemical reactions.☆12Oct 24, 2017Updated 8 years ago
- Source Code for 'Beginning Apache Spark 3' by Hien Luu☆13Oct 14, 2021Updated 4 years ago
- RedditR for Content Engagement and Recommendation☆18Dec 21, 2017Updated 8 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Python wrapper for OpenFOAM meshes☆13Sep 16, 2025Updated 7 months ago
- Scripts and code written whilst learning and experimenting with machine learning☆13Jul 18, 2022Updated 3 years ago
- Just a boilerplate for PySpark and Flask☆36Aug 2, 2018Updated 7 years ago
- Building Data Warehouse on BigQuery which takes flat file as the data sources with Airflow as the Orchestrator☆12May 23, 2021Updated 4 years ago
- Docker powered container for using Nginx as reverse-proxy in combination with an OpenVPN Client.☆11Jan 1, 2020Updated 6 years ago
- A python script to convert your youtube URL to an mp3 file and download it to the same directory as the .py file.☆10May 20, 2025Updated 11 months ago
- Distributed Data Systems with Azure Databricks, published by Packt☆12Jan 18, 2023Updated 3 years ago
- Multi-threaded simple proxy server in Python with file caching☆11Oct 4, 2020Updated 5 years ago
- My MSc project☆14Jun 5, 2011Updated 14 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- All Data Engineering notebooks from Datacamp course☆116Dec 11, 2019Updated 6 years ago
- A repo to track data engineering projects☆13Nov 11, 2022Updated 3 years ago
- Integrating Apache Airflow, dbt, Great Expectations and Apache Superset to develop a modern open source data stack.☆16Jun 19, 2022Updated 3 years ago
- 😈Complete End to End ETL Pipeline with Spark, Airflow, & AWS☆52Aug 23, 2019Updated 6 years ago
- Data processing and visualization of (crypto) currencies dynamics and technical indicator☆12Apr 16, 2020Updated 6 years ago
- ☆11Nov 21, 2023Updated 2 years ago
- Data Engineering pipeline hosted entirely in the AWS ecosystem utilizing DocumentDB as the database☆14Oct 26, 2021Updated 4 years ago