Data Forge — a modern data stack playground to practice flows and best practices, not just tools. Spark, Trino, Kafka, Iceberg, ClickHouse, Airflow, MinIO, Superset — all wired together locally with Docker Compose.
☆174Oct 11, 2025Updated 6 months ago
Alternatives and similar repositories for data-forge
Users that are interested in data-forge are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Distributed run of dbt models using Airflow☆169Apr 14, 2026Updated 3 weeks ago
- Data Engineering Digest☆29Jun 24, 2024Updated last year
- Building a Modern Data Lake with Minio, Spark, Airflow via Docker.☆23May 11, 2024Updated last year
- Python manager for spark-submit jobs☆10Jan 6, 2024Updated 2 years ago
- Queries the ACCESS_HISTORY and QUERY_HISTORY views, from the SNOWFLAKE.ACCOUNT_USAGE schema, and generates two interactive GraphViz visua…☆12Aug 28, 2023Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Trino Group Provider LDAP is a Trino (formerly Presto SQL) plugin to map user names to groups using an LDAP server☆22Mar 27, 2024Updated 2 years ago
- Roadmap для Data Engineer. Цель роадмапа – устроиться тебе на работу!☆490Mar 30, 2026Updated last month
- Snowflake scripts and useful snippets☆16Feb 2, 2025Updated last year
- build dw with dbt☆55Oct 24, 2024Updated last year
- This repo via a real world use case, shows how to launch dbt models from a DAG in Apache Airflow.☆14Apr 22, 2026Updated 2 weeks ago
- This code is used to populate the "ODS jobs dump" Telegram bot, and it can be used for any other dumped Slack channel☆14Sep 12, 2022Updated 3 years ago
- Module for pipelines concept in PySpark☆17Mar 27, 2024Updated 2 years ago
- Surfalytics projces on Data Engineering and Analytics☆121Apr 5, 2026Updated last month
- bmstu, IU7-7, Компьютерные сети (2020)☆15Oct 27, 2021Updated 4 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- A tutorial that helps Big Data Engineers ramp up faster by getting familiar with PySpark dataframes and functions. It also covers topics …☆20Nov 12, 2021Updated 4 years ago
- Wrapper over Yolo5Face☆19Nov 26, 2023Updated 2 years ago
- ERD Viewer and SQL Script Generator with a Streamlit App deployed in Snowflake☆28Sep 25, 2023Updated 2 years ago
- Материалы для курса Введение в Data Engineering: дата пайплайны☆10Feb 18, 2024Updated 2 years ago
- Analytics Engineer Course☆20May 17, 2023Updated 2 years ago
- Курс повышения квалификации - 1 - Математика и Python для анализа данных☆18Sep 12, 2017Updated 8 years ago
- Tools for working with CSV files in IPython.☆10Feb 17, 2016Updated 10 years ago
- A table-type dbt materialization for Snowflake to enable Time Travel☆22Jan 12, 2026Updated 3 months ago
- Find Niquests at https://github.com/jawah/niquests HTTP/2 HTTP/3 QUIC Async☆12Oct 22, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Alerting and monitoring tool for Apache Spark☆23May 20, 2022Updated 3 years ago
- Django-based backend for our learning management system☆468Apr 24, 2026Updated 2 weeks ago
- A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineer…☆587Feb 5, 2026Updated 3 months ago
- Adaptation postgres adapter for Greenplum☆36Mar 7, 2024Updated 2 years ago
- ☆43Aug 12, 2025Updated 8 months ago
- Docker Compose Workspace manager☆17Dec 22, 2025Updated 4 months ago
- Task management system☆24Jan 6, 2026Updated 4 months ago
- ПИК Комфорт для Home Assistant / PIK Comfort for Home Assistant☆18Aug 27, 2022Updated 3 years ago
- ☆38Apr 20, 2026Updated 2 weeks ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Self-contained demo using Kafka, Materialize and Metabase to check what's streaming on Twitch. All you need is Docker and Twitch access t…☆25Mar 22, 2022Updated 4 years ago
- Tableau VizQL Analysis in Python☆20Oct 21, 2020Updated 5 years ago
- A configurable cryptocurrency ticker - browser extension☆16Mar 22, 2018Updated 8 years ago
- An opinionated data-centric view of Debezium components. Please log issues at https://github.com/debezium/dbz/issues.☆46Updated this week
- ☁️🔔 webpush notification support for ntfy☆14Jan 7, 2023Updated 3 years ago
- This is open-source implementation of MixedAE (https://arxiv.org/pdf/2303.17152.pdf)☆22Feb 14, 2025Updated last year
- Open Statistics and Probability Theory course☆22Aug 31, 2025Updated 8 months ago