dagster-io / dagster
An orchestration platform for the development, production, and observation of data assets.
☆11,155Updated this week
Related projects: ⓘ
- Prefect is a workflow orchestration framework for building resilient data pipelines in Python.☆15,830Updated this week
- dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build application…☆9,626Updated this week
- Always know what to expect from your data.☆9,817Updated this week
- The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lak…☆15,484Updated this week
- A modular SQL linter and auto-formatter with support for multiple dialects and templated code.☆7,642Updated this week
- the portable Python dataframe library☆5,064Updated this week
- Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting…☆4,391Updated last week
- Build and manage real-life ML, AI, and data science projects with ease!☆8,046Updated this week
- Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.☆5,418Updated this week
- Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and…☆9,837Updated this week
- DuckDB is an analytical in-process SQL database management system☆22,674Updated this week
- Python SQL Parser and Transpiler☆6,395Updated this week
- 🧙 Build, run, and manage data pipelines for integrating and transforming data.☆7,722Updated this week
- Apache Airflow - A platform to programmatically author, schedule, and monitor workflows☆36,304Updated this week
- Build data pipelines, the easy way 🛠️☆4,055Updated last year
- The Open Source Feature Store for Machine Learning☆5,476Updated this week
- Dataframes powered by a multithreaded, vectorized query engine, written in Rust☆29,261Updated this week
- 🦉 ML Experiments and Data Management with Git☆13,608Updated this week
- Open source platform for the machine learning lifecycle☆18,340Updated this week
- A light-weight, flexible, and expressive statistical data testing library☆3,258Updated this week
- Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, vis…☆17,705Updated last week
- Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing☆14,283Updated this week
- data load tool (dlt) is an open source Python library that makes data loading easy 🛠️☆2,307Updated this week
- Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to wr…☆1,779Updated this week
- The Metadata Platform for your Data Stack☆9,674Updated this week
- Self-serve BI to 10x your data team ⚡️☆3,743Updated this week
- Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io☆1,866Updated last week
- Parallel computing with task scheduling☆12,405Updated this week
- Compare tables within or across databases☆2,933Updated 4 months ago
- Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, v…☆3,789Updated this week