Data Forge — a modern data stack playground to practice flows and best practices, not just tools. Spark, Trino, Kafka, Iceberg, ClickHouse, Airflow, MinIO, Superset — all wired together locally with Docker Compose.
☆176Oct 11, 2025Updated 7 months ago
Alternatives and similar repositories for data-forge
Users that are interested in data-forge are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Data catalog for everything in your company☆50Jun 5, 2023Updated 2 years ago
- Применение Debezium для обработки потоковых данных: Основные концепции, примеры.☆20Apr 12, 2025Updated last year
- Building a Modern Data Lake with Minio, Spark, Airflow via Docker.☆23May 11, 2024Updated 2 years ago
- Getting Started with Data Enngineering☆1,322Apr 20, 2025Updated last year
- Greengage DB is an open source MPP database platform based on Greenplum® database software.☆79May 22, 2026Updated last week
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Reading both XLSX and XLSB files, fast and memory-safe, with Python, into PyArrow☆12Feb 6, 2024Updated 2 years ago
- ☆22Dec 18, 2024Updated last year
- Workflows and Actions meant to be used by other repositories to make repo maintenance easier☆20May 19, 2026Updated last week
- Roadmap для Data Engineer. Цель роадмапа – устроиться тебе на работу!☆491Mar 30, 2026Updated last month
- Snowflake scripts and useful snippets☆16Feb 2, 2025Updated last year
- Realtime Data Engineering Project☆31Jan 12, 2025Updated last year
- This repo via a real world use case, shows how to launch dbt models from a DAG in Apache Airflow.☆14Apr 22, 2026Updated last month
- Collection of Snowflake Scripting procedures extending GET_DDL function by dwh.dev.☆15Jul 23, 2024Updated last year
- ☆165Mar 3, 2026Updated 2 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- This code is used to populate the "ODS jobs dump" Telegram bot, and it can be used for any other dumped Slack channel☆14Sep 12, 2022Updated 3 years ago
- ☆18Dec 2, 2025Updated 5 months ago
- ITSumma Spark Greenplum Connector☆43May 6, 2026Updated 3 weeks ago
- CraftML is a restful web service for easy pipeline creation without code.☆13Apr 18, 2021Updated 5 years ago
- A high-performance image processing library designed to optimize and extend the Albumentations library with specialized functions for adv…☆104May 21, 2026Updated last week
- This project is used to capture machine learning pipelines created on top of Spark as OK☆54Nov 1, 2022Updated 3 years ago
- FlockFlock: File Access Policy Enforcement for macOS☆26Aug 2, 2016Updated 9 years ago
- ERD Viewer and SQL Script Generator with a Streamlit App deployed in Snowflake☆28Sep 25, 2023Updated 2 years ago
- A Procedure To Create A Yarn Cluster Based on Docker, Run Spark, And Do TPC-DS Performance Test.☆16Jan 3, 2024Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Analytics Engineer Course☆20May 17, 2023Updated 3 years ago
- Alerting and monitoring tool for Apache Spark☆23May 20, 2022Updated 4 years ago
- An implementation of Pregel framework and graph algorithms on top of it with Ibis project DataFrames.☆23May 21, 2026Updated last week
- Apache Superset (incubating) is a modern, enterprise-ready business intelligence web application☆19Mar 13, 2023Updated 3 years ago
- A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineer…☆587Feb 5, 2026Updated 3 months ago
- Adaptation postgres adapter for Greenplum☆36Mar 7, 2024Updated 2 years ago
- ☆43Aug 12, 2025Updated 9 months ago
- Simple implantation of schema registry for JSON schema events☆17Apr 8, 2026Updated last month
- Docker Compose Workspace manager☆17Dec 22, 2025Updated 5 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Minimalistic VSCode theme inspired by old-fashioned hobbies.☆71Updated this week
- Self-contained demo using Kafka, Materialize and Metabase to check what's streaming on Twitch. All you need is Docker and Twitch access t…☆25Mar 22, 2022Updated 4 years ago
- A configurable cryptocurrency ticker - browser extension☆16Mar 22, 2018Updated 8 years ago
- ☁️🔔 webpush notification support for ntfy☆14Jan 7, 2023Updated 3 years ago
- Open Statistics and Probability Theory course☆22Aug 31, 2025Updated 8 months ago
- Make dbt great again! Extend dbt with plugins, local docs and custom adapters — fast, safe, and developer-friendly☆292Mar 3, 2026Updated 2 months ago
- Mock streaming data generator☆18May 31, 2024Updated last year