Data Forge — a modern data stack playground to practice flows and best practices, not just tools. Spark, Trino, Kafka, Iceberg, ClickHouse, Airflow, MinIO, Superset — all wired together locally with Docker Compose.
☆173Oct 11, 2025Updated 6 months ago
Alternatives and similar repositories for data-forge
Users that are interested in data-forge are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Distributed run of dbt models using Airflow☆169Updated this week
- Data catalog for everything in your company☆50Jun 5, 2023Updated 2 years ago
- DE or DIE meetup made by data engineers for data engineers. Currently in Russian only.☆58Jan 6, 2024Updated 2 years ago
- Применение Debezium для обработки потоковых данных: Основные концепции, примеры.☆20Apr 12, 2025Updated last year
- Building a Modern Data Lake with Minio, Spark, Airflow via Docker.☆23May 11, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Docker Compose with Almond.sh core for Jupyter☆18Sep 1, 2024Updated last year
- Terraform module for creating EKS clusters optimized for ClickHouse® with EBS and autoscaling ☁️☆27Mar 25, 2026Updated 3 weeks ago
- Realtime monitoring of running, queued and blocked queries in Snowflake☆56Aug 25, 2025Updated 7 months ago
- Code from lectures on backend mini course☆15Mar 16, 2019Updated 7 years ago
- Spark in Kubernetes☆39Jun 3, 2024Updated last year
- This code is used to populate the "ODS jobs dump" Telegram bot, and it can be used for any other dumped Slack channel☆14Sep 12, 2022Updated 3 years ago
- python курс☆39Updated this week
- Demonstration Database☆38Apr 2, 2026Updated 2 weeks ago
- ITSumma Spark Greenplum Connector☆43Mar 17, 2026Updated last month
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- A tutorial that helps Big Data Engineers ramp up faster by getting familiar with PySpark dataframes and functions. It also covers topics …☆20Nov 12, 2021Updated 4 years ago
- Wrapper over Yolo5Face☆19Nov 26, 2023Updated 2 years ago
- Analytics Engineer Course☆20May 17, 2023Updated 2 years ago
- Explore the dbt Semantic Layer☆31May 26, 2025Updated 10 months ago
- A table-type dbt materialization for Snowflake to enable Time Travel☆22Jan 12, 2026Updated 3 months ago
- Find Niquests at https://github.com/jawah/niquests HTTP/2 HTTP/3 QUIC Async☆12Oct 22, 2024Updated last year
- The bot that sends daily closed issues digest to our team☆51Jun 9, 2019Updated 6 years ago
- Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observ…☆190Jan 5, 2026Updated 3 months ago
- An implementation of Pregel framework and graph algorithms on top of it with Ibis project DataFrames.☆23Apr 7, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- open source data lake☆32Jan 17, 2025Updated last year
- A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineer…☆584Feb 5, 2026Updated 2 months ago
- Docker Compose Workspace manager☆17Dec 22, 2025Updated 3 months ago
- Simple high-performance TCP-level load balancer / reverse proxy made in Rust☆11Apr 1, 2026Updated 2 weeks ago
- Task management system☆24Jan 6, 2026Updated 3 months ago
- ПИК Комфорт для Home Assistant / PIK Comfort for Home Assistant☆18Aug 27, 2022Updated 3 years ago
- Minimalistic VSCode theme inspired by old-fashioned hobbies.☆72Apr 11, 2026Updated last week
- Describe business metrics with YAML, query and visualize in Jupyter with zero SQL☆21Sep 1, 2022Updated 3 years ago
- ☆11Jan 25, 2021Updated 5 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆36Jan 13, 2026Updated 3 months ago
- Tableau VizQL Analysis in Python☆20Oct 21, 2020Updated 5 years ago
- Simple migration tool for clickhouse database☆27Updated this week
- A configurable cryptocurrency ticker - browser extension☆16Mar 22, 2018Updated 8 years ago
- An opinionated data-centric view of Debezium components. Please log issues at https://github.com/debezium/dbz/issues.☆44Updated this week
- This is open-source implementation of MixedAE (https://arxiv.org/pdf/2303.17152.pdf)☆22Feb 14, 2025Updated last year
- Template for Scala Spark with Unit Test☆13Jul 24, 2023Updated 2 years ago