Building a Data Pipeline with an Open Source Stack
☆57Jun 27, 2025Updated 8 months ago
Alternatives and similar repositories for os-data-stack
Users that are interested in os-data-stack are comparing it to the libraries listed below
Sorting:
- A portable Datamart and Business Intelligence suite built with Docker, Airflow, dbt, duckdb and Superset☆49Mar 9, 2026Updated last week
- build dw with dbt☆53Oct 24, 2024Updated last year
- Code for Medium article "Automate Presentation Creation with Python"☆24Aug 6, 2024Updated last year
- XML/XSLT processing in the browser, supported by a Typescript library☆10Feb 18, 2025Updated last year
- Unit testing using databricks connect☆32Nov 3, 2021Updated 4 years ago
- This is a demo project to compare two web scrapping frameworks, Playwright and Selenium and using the new Pipelining tool Dagster☆15Sep 9, 2021Updated 4 years ago
- End to End Sales Streaming Pipeline (FastAPI, Kafka, Spark, Cassandra, MySQL, Superset)☆10May 26, 2023Updated 2 years ago
- Deployed an kafka instance in AWS EC2 Instance to streamline the data into Databricks☆10Aug 15, 2023Updated 2 years ago
- A repository to store recipes, custom sources, transformations and other things to make your DataHub experience magical☆12Sep 23, 2022Updated 3 years ago
- ☆19May 31, 2025Updated 9 months ago
- How to build an ACP compliant agent that uses MCP as well!☆11May 6, 2025Updated 10 months ago
- A self-contained, ready to run Airflow ELT project. Can be run locally or within codespaces.☆81Aug 21, 2023Updated 2 years ago
- ☆12Aug 13, 2024Updated last year
- Docker compose and Google Colab demo to build a CDC with Delta Lake☆15Sep 7, 2022Updated 3 years ago
- ☆12Jan 18, 2021Updated 5 years ago
- This repo contains DAGs demonstrating a variety of ELT patterns using Airflow along with dbt.☆12Jan 12, 2023Updated 3 years ago
- Metabase Teradata Driver shipped as 3rd party plugin☆11Dec 1, 2025Updated 3 months ago
- Predicting the Stock Market - Can we do it?☆10Jul 24, 2021Updated 4 years ago
- Escrevi este roadmap para ajudar amigos próximos, está aberto a sugestões!☆14Sep 9, 2025Updated 6 months ago
- Playground site for creating/validating data contracts☆11Aug 9, 2025Updated 7 months ago
- ☆13Sep 5, 2025Updated 6 months ago
- Using Apache Airflow to author, run and monitor complex data pipelines.☆12Oct 24, 2018Updated 7 years ago
- text2sql with modern LLMs (duckdb-nsql, SQLCoder etc ...)☆18Apr 13, 2024Updated last year
- Data Mesh Manager (Community Edition)☆58Oct 24, 2025Updated 4 months ago
- Proyecto de juguete para mostrar cómo realizar el setup de un proyecto de data science☆11Nov 24, 2022Updated 3 years ago
- ☆28Nov 7, 2025Updated 4 months ago
- ☆19Feb 25, 2022Updated 4 years ago
- Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize a…☆38Dec 15, 2025Updated 3 months ago
- Code for DE101 book at https://de101.startdataengineering.com/☆91Feb 22, 2026Updated 3 weeks ago
- Para entender e aprender um pouco sobre o Apache Kafka.https://www.youtube.com/channel/UC3pevgVzUWKo5CoWdhDsoHw☆13Mar 10, 2026Updated last week
- ☆10Feb 2, 2024Updated 2 years ago
- ☆10Jul 1, 2024Updated last year
- Classify your e-commerce products into categories of well-known e-commerce platforms. It uses OpenAI embeddings and LangChain.☆19Feb 8, 2024Updated 2 years ago
- Agent Memory Playground: AI Agent Memory Design & Optimization Techniques☆32Aug 7, 2025Updated 7 months ago
- ☆11Feb 13, 2019Updated 7 years ago
- ☆11Aug 20, 2024Updated last year
- ☆66Mar 9, 2026Updated last week
- A script/docker that automatically translates PDFs using the DeepL API☆12Jan 18, 2026Updated 2 months ago
- Data Engineering Project to Extract and Process Solana Reddit Data☆40Feb 3, 2024Updated 2 years ago