Building a Modern Data Lake with Minio, Spark, Airflow via Docker.
☆23May 11, 2024Updated last year
Alternatives and similar repositories for docker-airflow-spark
Users that are interested in docker-airflow-spark are comparing it to the libraries listed below
Sorting:
- Open episode of the data engineering practice course☆32Jul 2, 2024Updated last year
- The simple ETL with docker container☆66May 30, 2025Updated 9 months ago
- ☆19Feb 25, 2022Updated 4 years ago
- Writes the CSV file to Postgres, read table and modify it. Write more tables to Postgres with Airflow.☆37Sep 1, 2023Updated 2 years ago
- A Python package extending pandas with helper functions for simpler exploratory data analysis and data wrangling.☆10Feb 6, 2025Updated last year
- A compilation of components to optimize the development of your ecommerce☆13Jun 23, 2025Updated 8 months ago
- End-to-end data platform: A PoC Data Platform project utilizing modern data stack (Spark, Airflow, DBT, Trino, Lightdash, Hive metastore,…☆48Oct 14, 2024Updated last year
- Soname ALerts & MONitoring☆19Jan 21, 2025Updated last year
- ☆41Jan 24, 2023Updated 3 years ago
- Data pipeline to build a data warehouse on Postgres☆14Aug 11, 2024Updated last year
- This is the HTML-CSS source code to build my personal website.☆10Nov 13, 2025Updated 3 months ago
- Data Analysis and Image Processing Python Course☆12Nov 4, 2014Updated 11 years ago
- ☆10Jan 24, 2023Updated 3 years ago
- Classify images of different kitchenware items☆11Apr 17, 2023Updated 2 years ago
- Codes for the paper "Residuals-based Distributionally Robust Optimization with Covariate Information"☆10Aug 13, 2022Updated 3 years ago
- FSUIPC external application interface tools listening tools written in nodeJS☆14May 15, 2023Updated 2 years ago
- A script/docker that automatically translates PDFs using the DeepL API☆11Jan 18, 2026Updated last month
- Modern games store web application built with React and Spring☆11Dec 15, 2023Updated 2 years ago
- This Repo contains tools that allow us to import, clean, manipulate, and visualize data —Includes Python libraries, like pandas, NumPy, M…☆13Jul 7, 2024Updated last year
- Integrating Apache Airflow, dbt, Great Expectations and Apache Superset to develop a modern open source data stack.☆16Jun 19, 2022Updated 3 years ago
- The Modern Data Stack in a (Smaller) Box☆12Jan 28, 2023Updated 3 years ago
- ☆45Feb 13, 2026Updated 3 weeks ago
- This project aims to move the data from a Relational database system (RDBMS) to a Hadoop file system (HDFS)☆11Apr 29, 2022Updated 3 years ago
- The "World Data Report" is a Power BI project that offers a detailed overview of global data, covering weather, geographical, demographic…☆15Nov 30, 2025Updated 3 months ago
- Traditionally, engineers were needed to implement business logic via data pipelines before business users can start using it. Using this …☆12Feb 26, 2026Updated last week
- A fast development template for Admin-dashboard based on Ext JS Classic toolkit☆10Jun 29, 2018Updated 7 years ago
- minio as local storage and DynamoDB as catalog☆15May 14, 2024Updated last year
- This code is used to populate the "ODS jobs dump" Telegram bot, and it can be used for any other dumped Slack channel☆14Sep 12, 2022Updated 3 years ago
- Spark Standalone & Livy☆11Jul 13, 2021Updated 4 years ago
- granadoespadav32 private server setup☆17Jan 24, 2024Updated 2 years ago
- Various utilities and info for King's Raid☆16Mar 30, 2021Updated 4 years ago
- Python tool for profiling-based anomaly monitoring on ETL data pipelines leveraging ML and Apache Spark.☆16Mar 5, 2024Updated 2 years ago
- The application provides a RESTful API that allows clients to upload files (pdf, csv, txt), generates a conversational retrieval model us…☆13Jul 8, 2023Updated 2 years ago
- Accompanying code for our NeurIPS 2019 paper☆12Nov 7, 2019Updated 6 years ago
- ☆17Updated this week
- Process manager and website for hosting multiple Streamlit apps☆14Jun 28, 2023Updated 2 years ago
- Deploy a complete data stack in just a couple of minutes.☆15Mar 6, 2024Updated 2 years ago
- A custom end-to-end analytics platform for customer churn☆11May 15, 2025Updated 9 months ago
- Dockerizing and Consuming an Apache Livy environment☆13Jun 29, 2022Updated 3 years ago