π¦ Batch data pipeline with Airflow, DuckDB, Delta Lake, Trino, MinIO, and Metabase. Full observability and data quality.
β88Nov 5, 2025Updated 6 months ago
Alternatives and similar repositories for batch-data-pipeline
Users that are interested in batch-data-pipeline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Question and Answer application using AWS Bedrock, AWS ECS, Langchain, Qdrant, and FastAPIβ15Feb 27, 2024Updated 2 years ago
- Hexagonal (ports and adapters) architecture applied to Spark and Python data engineering projectβ33Jul 26, 2023Updated 2 years ago
- My dotfiles used on MacOS (Arch Linux on `linux` branch). Include custom scripts and configs for NeoVim, ZSH, Tmux, Ranger, and more.β10Feb 9, 2026Updated 3 months ago
- An open and introductory book for the Python API of Apache Spark (pyspark) ππβ12Sep 19, 2025Updated 8 months ago
- β15Mar 29, 2024Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available β’ AdRun AI, ML, and HPC workloads on powerful cloud GPUsβwithout limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- A portable Datamart and Business Intelligence suite built with Docker, Airflow, dbt, duckdb and Supersetβ49Apr 5, 2026Updated last month
- Cutting-edge, opinionated, and ambitious project builder for power users and researchers.β16Feb 2, 2026Updated 3 months ago
- Data Engineering Projects using Mage.ai as orchestratorβ20Jan 20, 2026Updated 4 months ago
- A testing ground for Plotly Dash app development including app features and experimenting with dashboard visualizations.β10Oct 15, 2023Updated 2 years ago
- Use MobileNet SSD and openCV to detect and count car on roadβ11Jan 13, 2020Updated 6 years ago
- Realistic OLTP data simulator for CDC testing with Debeziumβ17Nov 5, 2025Updated 6 months ago
- A python script to convert your youtube URL to an mp3 file and download it to the same directory as the .py file.β10May 20, 2025Updated last year
- Transcribe speech to text, then receive a virtual assistant response to what you say from openaiβ16Sep 23, 2022Updated 3 years ago
- Deploy a complete data stack in just a couple of minutes.β15Mar 6, 2024Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Multi-threaded simple proxy server in Python with file cachingβ11Oct 4, 2020Updated 5 years ago
- β13Sep 23, 2023Updated 2 years ago
- β13May 11, 2026Updated 2 weeks ago
- Practice notebooks for NumPy, Pandas, matplotlib, basic machine learning etc.β13Nov 20, 2017Updated 8 years ago
- Integrating Apache Airflow, dbt, Great Expectations and Apache Superset to develop a modern open source data stack.β18Jun 19, 2022Updated 3 years ago
- End-to-end data platform: A PoC Data Platform project utilizing modern data stack (Spark, Airflow, DBT, Trino, Lightdash, Hive metastore,β¦β48Oct 14, 2024Updated last year
- β16Apr 18, 2025Updated last year
- learning-by-doing data model built with dbt-coreβ17Apr 10, 2026Updated last month
- β11Nov 21, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- β31Aug 21, 2025Updated 9 months ago
- β11Feb 24, 2022Updated 4 years ago
- β15Apr 14, 2026Updated last month
- β16Nov 27, 2025Updated 6 months ago
- CMU 15-712 lecture slidesβ11Jan 6, 2020Updated 6 years ago
- http://archive.ics.uci.edu/ml/index.htmlβ12Jan 25, 2020Updated 6 years ago
- β13Dec 28, 2023Updated 2 years ago
- β18May 27, 2025Updated last year
- β11Jan 31, 2019Updated 7 years ago
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Create an Anime database containing all the Anime currently available on the website, which includes: 'Anime Title', 'Description', 'Cβ¦β12Jun 10, 2020Updated 5 years ago
- Batch Processing , orchestration using Apache Airflow and Google Workflows, spark structured Streaming and a lot moreβ18Jun 21, 2022Updated 3 years ago
- β16Apr 26, 2020Updated 6 years ago
- β10Jul 20, 2020Updated 5 years ago
- A modern port of the ELIZA conversational program to pure Ink to run as a command line and in the browser.β16Apr 30, 2021Updated 5 years ago
- Miscellaneous codes and writings for MLOpsβ15Apr 8, 2026Updated last month
- Interactive web-based dashboard to manage traffic flow using YOLOX, DeepSORTβ12Jul 30, 2022Updated 3 years ago